2 code implementations • 15 Mar 2024 • Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu
Building upon our MVControl architecture, we employ a unique hybrid diffusion guidance method to direct the optimization process.
1 code implementation • 14 Mar 2024 • Guo Chen, Yifei HUANG, Jilan Xu, Baoqi Pei, Zhe Chen, Zhiqi Li, Jiahao Wang, Kunchang Li, Tong Lu, LiMin Wang
We categorize Mamba into four roles for modeling videos, deriving a Video Mamba Suite composed of 14 models/modules, and evaluating them on 12 video understanding tasks.
Ranked #1 on Temporal Action Localization on FineAction
no code implementations • 29 Feb 2024 • Zexi Li, Jie Lin, Zhiqi Li, Didi Zhu, Chao Wu
Bridging the gap between LMC and FL, in this paper, we leverage fixed anchor models to empirically and theoretically study the transitivity property of connectivity from two models (LMC) to a group of models (model fusion in FL).
no code implementations • 2 Feb 2024 • Zexi Li, Zhiqi Li, Jie Lin, Tao Shen, Tao Lin, Chao Wu
In deep learning, stochastic gradient descent often yields functionally similar yet widely scattered solutions in the weight space even under the same initialization, causing barriers in the Linear Mode Connectivity (LMC) landscape.
1 code implementation • 11 Jan 2024 • Yuwen Xiong, Zhiqi Li, Yuntao Chen, Feng Wang, Xizhou Zhu, Jiapeng Luo, Wenhai Wang, Tong Lu, Hongsheng Li, Yu Qiao, Lewei Lu, Jie zhou, Jifeng Dai
The advancements in speed and efficiency of DCNv4, combined with its robust performance across diverse vision tasks, show its potential as a foundational building block for future vision models.
1 code implementation • 14 Dec 2023 • Wenhai Wang, Jiangwei Xie, Chuanyang Hu, Haoming Zou, Jianan Fan, Wenwen Tong, Yang Wen, Silei Wu, Hanming Deng, Zhiqi Li, Hao Tian, Lewei Lu, Xizhou Zhu, Xiaogang Wang, Yu Qiao, Jifeng Dai
In this work, we delve into the potential of large language models (LLMs) in autonomous driving (AD).
1 code implementation • 5 Dec 2023 • Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, Jose M. Alvarez
We initially observed that the nuScenes dataset, characterized by relatively simple driving scenarios, leads to an under-utilization of perception information in end-to-end models incorporating ego status, such as the ego vehicle's velocity.
no code implementations • 27 Nov 2023 • Yiming Chen, Zhiqi Li, Peidong Liu
The main insight is that we exploit the images generated by a large pre-trained text-to-image diffusion model, to supervise the training of a text conditioned 3D generative adversarial network.
1 code implementation • 24 Nov 2023 • Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu
Our approach enables the generation of controllable multi-view images and view-consistent 3D content.
1 code implementation • 21 Nov 2023 • Cheng Wan, Hongyuan Yu, Zhiqi Li, Yihang Chen, Yajun Zou, Yuqing Liu, Xuanwu Yin, Kunlong Zuo
To address this issue, we propose the Swift Parameter-free Attention Network (SPAN), a highly efficient SISR model that balances parameter count, inference speed, and image quality.
Ranked #37 on Image Super-Resolution on Set14 - 4x upscaling
1 code implementation • NeurIPS 2023 • Linyan Huang, Zhiqi Li, Chonghao Sima, Wenhai Wang, Jingdong Wang, Yu Qiao, Hongyang Li
Current research is primarily dedicated to advancing the accuracy of camera-only 3D object detectors (apprentice) through the knowledge transferred from LiDAR- or multi-modal-based counterparts (expert).
Ranked #6 on 3D Object Detection on nuScenes Camera Only
1 code implementation • ICCV 2023 • Zhiqi Li, Zhiding Yu, Wenhai Wang, Anima Anandkumar, Tong Lu, Jose M. Alvarez
Currently, the two most prominent VTM paradigms are forward projection and backward projection.
1 code implementation • 4 Jul 2023 • Zhiqi Li, Zhiding Yu, David Austin, Mingsheng Fang, Shiyi Lan, Jan Kautz, Jose M. Alvarez
This technical report summarizes the winning solution for the 3D Occupancy Prediction Challenge, which is held in conjunction with the CVPR 2023 Workshop on End-to-End Autonomous Driving and CVPR 23 Workshop on Vision-Centric Autonomous Driving Workshop.
Ranked #1 on Prediction Of Occupancy Grid Maps on Occ3D-nuScenes
no code implementations • 23 Mar 2023 • Yun Liu, Xuefeng Yan, Zhilei Chen, Zhiqi Li, Zeyong Wei, Mingqiang Wei
Self-supervised learning is attracting large attention in point cloud understanding.
no code implementations • 28 Feb 2023 • Yizhong Zhang, Zhiqi Li, Sicheng Xu, Chong Li, Jiaolong Yang, Xin Tong, Baining Guo
A key challenge in emulating the remote hand touch is the realistic rendering of the participant's hand and arm as the hand touches the screen.
2 code implementations • CVPR 2023 • Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao
Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state.
Ranked #1 on Instance Segmentation on COCO test-dev (AP50 metric, using extra training data)
2 code implementations • 12 Sep 2022 • Hongyang Li, Chonghao Sima, Jifeng Dai, Wenhai Wang, Lewei Lu, Huijie Wang, Jia Zeng, Zhiqi Li, Jiazhi Yang, Hanming Deng, Hao Tian, Enze Xie, Jiangwei Xie, Li Chen, Tianyu Li, Yang Li, Yulu Gao, Xiaosong Jia, Si Liu, Jianping Shi, Dahua Lin, Yu Qiao
As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance.
2 code implementations • 1 Sep 2022 • Jie Zhang, Zhiqi Li, Bo Li, Jianghe Xu, Shuang Wu, Shouhong Ding, Chao Wu
Extensive experiments on federated datasets and real-world datasets demonstrate that FedLC leads to a more accurate global model and much improved performance.
3 code implementations • 31 Mar 2022 • Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, Jifeng Dai
In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries.
2 code implementations • CVPR 2022 • Zhiqi Li, Wenhai Wang, Enze Xie, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo, Tong Lu
Specifically, we supervise the attention modules in the mask decoder in a layer-wise manner.
Ranked #4 on Panoptic Segmentation on COCO test-dev
1 code implementation • 14 Apr 2021 • Ruo-Ze Liu, Wenhai Wang, Yanjie Shen, Zhiqi Li, Yang Yu, Tong Lu
StarCraft II (SC2) is a real-time strategy game in which players produce and control multiple units to fight against opponent's units.