no code implementations • 21 Mar 2024 • Lizhe Liu, Bohua Wang, Hongwei Xie, Daqi Liu, Li Liu, Zhiqiang Tian, Kuiyuan Yang, Bing Wang
Vision-centric 3D environment understanding is both vital and challenging for autonomous driving systems.
no code implementations • 15 Jun 2023 • Mingjie Pan, Li Liu, Jiaming Liu, Peixiang Huang, Longlong Wang, Shanghang Zhang, Shaoqing Xu, Zhiyi Lai, Kuiyuan Yang
In this technical report, we present our solution, named UniOCC, for the Vision-Centric 3D occupancy prediction track in the nuScenes Open Dataset Challenge at CVPR 2023.
Ranked #3 on Prediction Of Occupancy Grid Maps on Occ3D-nuScenes
no code implementations • 26 Feb 2023 • Wei Yu, Kuiyuan Yang, Yalong Bai, Hongxun Yao, Yong Rui
The image and query are mapped to a common vector space via these two parts respectively, and image-query similarity is naturally defined as an inner product of their mappings in the space.
1 code implementation • 10 Jul 2022 • Xiangtai Li, Jiangning Zhang, Yibo Yang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong, DaCheng Tao
In this paper, we focus on exploring effective methods for faster, accurate, and domain agnostic semantic segmentation.
1 code implementation • 28 Jul 2021 • Xiangtai Li, Hao He, Yibo Yang, Henghui Ding, Kuiyuan Yang, Guangliang Cheng, Yunhai Tong, DaCheng Tao
To incorporate both temporal and scale information, we propose a Temporal Pyramid Routing (TPR) strategy to conditionally align and conduct pixel-level aggregation from a feature pyramid pair of two adjacent frames.
1 code implementation • 28 Jul 2021 • Xiangtai Li, Li Zhang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong, Xiatian Zhu, Tao Xiang
Modelling long-range contextual relationships is critical for pixel-wise prediction tasks such as semantic segmentation.
1 code implementation • 6 Nov 2020 • Xiangtai Li, Xia Li, Ansheng You, Li Zhang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong, Zhouchen Lin
Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector and perform reasoning within the single vector where the computation cost can be significantly reduced.
1 code implementation • ECCV 2020 • Chang Shu, Kun Yu, Zhixiang Duan, Kuiyuan Yang
Photometric loss is widely used for self-supervised depth and egomotion estimation.
6 code implementations • ECCV 2020 • Xiangtai Li, Ansheng You, Zhen Zhu, Houlong Zhao, Maoke Yang, Kuiyuan Yang, Yunhai Tong
A common practice to improve the performance is to attain high resolution feature maps with strong semantic representation.
Ranked #2 on Real-Time Semantic Segmentation on Cityscapes test
2 code implementations • 16 Sep 2019 • Xiangtai Li, Li Zhang, Ansheng You, Maoke Yang, Kuiyuan Yang, Yunhai Tong
GALD is end-to-end trainable and can be easily plugged into existing FCNs with various global aggregation modules for a wide range of vision tasks, and consistently improves the performance of state-of-the-art object detection and instance segmentation approaches.
Ranked #1 on Semantic Segmentation on PASCAL VOC 2007
6 code implementations • 13 Sep 2019 • Li Zhang, Xiangtai Li, Anurag Arnab, Kuiyuan Yang, Yunhai Tong, Philip H. S. Torr
Exploiting long-range contextual information is key for pixel-wise prediction tasks such as semantic segmentation.
Ranked #32 on Semantic Segmentation on Cityscapes test
2 code implementations • 9 Sep 2019 • Youmin Zhang, Yimin Chen, Xiao Bai, Suihanjin Yu, Kun Yu, Zhiwei Li, Kuiyuan Yang
However, disparity is just a byproduct of a matching process modeled by cost volume, while indirectly learning cost volume driven by disparity regression is prone to overfitting since the cost volume is under constrained.
2 code implementations • 3 Apr 2019 • Xiangtai Li, Houlong Zhao, Lei Han, Yunhai Tong, Kuiyuan Yang
Semantic segmentation generates comprehensive understanding of scenes through densely predicting the category for each pixel.
Ranked #29 on Semantic Segmentation on Cityscapes test
1 code implementation • CVPR 2018 • Maoke Yang, Kun Yu, Chi Zhang, Zhiwei Li, Kuiyuan Yang
To this end, we propose Densely connected Atrous Spatial Pyramid Pooling (DenseASPP), which connects a set of atrous convolutional layers in a dense way, such that it generates multi-scale features that not only cover a larger scale range, but also cover that scale range densely, without significantly increasing the model size.
Ranked #5 on Semantic Segmentation on SkyScapes-Dense
no code implementations • 28 Aug 2017 • Yalong Bai, Kuiyuan Yang, Tao Mei, Wei-Ying Ma, Tiejun Zhao
Large scale image dataset and deep convolutional neural network (DCNN) are two primary driving forces for the rapid progress made in generic object recognition tasks in recent years.
no code implementations • ICLR 2018 • Yuhui Yuan, Kuiyuan Yang, Chao Zhang
Thus, we propose feature incay to also regularize representation learning, which favors feature vectors with large norm when the samples can be correctly classified.
1 code implementation • ICCV 2017 • Yuhui Yuan, Kuiyuan Yang, Chao Zhang
This motivates us to ensemble a set of models with different complexities in cascaded manner and mine hard examples adaptively, a sample is judged by a series of models with increasing complexities and only updates models that consider the sample as a hard case.
Ranked #14 on Image Retrieval on SOP
no code implementations • CVPR 2016 • Chuang Gan, Ting Yao, Kuiyuan Yang, Yi Yang, Tao Mei
The Web images are then filtered by the learnt network and the selected images are additionally fed into the network to enhance the architecture and further trim the videos.
no code implementations • 20 Dec 2014 • Wei Yu, Kuiyuan Yang, Yalong Bai, Hongxun Yao, Yong Rui
Convolutional Neural Networks (CNNs) have achieved comparable error rates to well-trained human on ILSVRC2014 image classification task.
no code implementations • 24 Nov 2014 • Yichong Xu, Tianjun Xiao, Jiaxing Zhang, Kuiyuan Yang, Zheng Zhang
Even though convolutional neural networks (CNN) has achieved near-human performance in various computer vision tasks, its ability to tolerate scale variations is limited.
no code implementations • CVPR 2015 • Tianjun Xiao, Yichong Xu, Kuiyuan Yang, Jiaxing Zhang, Yuxin Peng, Zheng Zhang
Our pipeline integrates three types of attention: the bottom-up attention that propose candidate patches, the object-level top-down attention that selects relevant patches to a certain object, and the part-level top-down attention that localizes discriminative parts.
no code implementations • 17 Dec 2013 • Yalong Bai, Kuiyuan Yang, Wei Yu, Wei-Ying Ma, Tiejun Zhao
Image retrieval refers to finding relevant images from an image database for a query, which is considered difficult for the gap between low-level representation of images and high-level representation of queries.