Search Results for author: Zhiqi Li

Found 21 papers, 16 papers with code

Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting

2 code implementations • 15 Mar 2024 • Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu

Building upon our MVControl architecture, we employ a unique hybrid diffusion guidance method to direct the optimization process.

3D Generation Image to 3D +1

139

Paper
Code

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

1 code implementation • 14 Mar 2024 • Guo Chen, Yifei HUANG, Jilan Xu, Baoqi Pei, Zhe Chen, Zhiqi Li, Jiahao Wang, Kunchang Li, Tong Lu, LiMin Wang

We categorize Mamba into four roles for modeling videos, deriving a Video Mamba Suite composed of 14 models/modules, and evaluating them on 12 video understanding tasks.

Ranked #1 on Temporal Action Localization on FineAction

Moment Retrieval Temporal Action Localization +1

142

Paper
Code

Improving Group Connectivity for Generalization of Federated Deep Learning

no code implementations • 29 Feb 2024 • Zexi Li, Jie Lin, Zhiqi Li, Didi Zhu, Chao Wu

Bridging the gap between LMC and FL, in this paper, we leverage fixed anchor models to empirically and theoretically study the transitivity property of connectivity from two models (LMC) to a group of models (model fusion in FL).

Federated Learning Linear Mode Connectivity

Paper
Add Code

Training-time Neuron Alignment through Permutation Subspace for Improving Linear Mode Connectivity and Model Fusion

no code implementations • 2 Feb 2024 • Zexi Li, Zhiqi Li, Jie Lin, Tao Shen, Tao Lin, Chao Wu

In deep learning, stochastic gradient descent often yields functionally similar yet widely scattered solutions in the weight space even under the same initialization, causing barriers in the Linear Mode Connectivity (LMC) landscape.

Federated Learning Linear Mode Connectivity

Paper
Add Code

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

1 code implementation • 11 Jan 2024 • Yuwen Xiong, Zhiqi Li, Yuntao Chen, Feng Wang, Xizhou Zhu, Jiapeng Luo, Wenhai Wang, Tong Lu, Hongsheng Li, Yu Qiao, Lewei Lu, Jie zhou, Jifeng Dai

The advancements in speed and efficiency of DCNv4, combined with its robust performance across diverse vision tasks, show its potential as a foundational building block for future vision models.

Image Classification Image Generation +1

336

Paper
Code

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

1 code implementation • 14 Dec 2023 • Wenhai Wang, Jiangwei Xie, Chuanyang Hu, Haoming Zou, Jianan Fan, Wenwen Tong, Yang Wen, Silei Wu, Hanming Deng, Zhiqi Li, Hao Tian, Lewei Lu, Xizhou Zhu, Xiaogang Wang, Yu Qiao, Jifeng Dai

In this work, we delve into the potential of large language models (LLMs) in autonomous driving (AD).

Autonomous Driving Motion Planning

124

Paper
Code

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

1 code implementation • 5 Dec 2023 • Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, Jose M. Alvarez

We initially observed that the nuScenes dataset, characterized by relatively simple driving scenarios, leads to an under-utilization of perception information in end-to-end models incorporating ego status, such as the ego vehicle's velocity.

Autonomous Driving

Paper
Code

ET3D: Efficient Text-to-3D Generation via Multi-View Distillation

no code implementations • 27 Nov 2023 • Yiming Chen, Zhiqi Li, Peidong Liu

The main insight is that we exploit the images generated by a large pre-trained text-to-image diffusion model, to supervise the training of a text conditioned 3D generative adversarial network.

3D Generation Generative Adversarial Network +2

Paper
Add Code

MVControl: Adding Conditional Control to Multi-view Diffusion for Controllable Text-to-3D Generation

1 code implementation • 24 Nov 2023 • Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu

Our approach enables the generation of controllable multi-view images and view-consistent 3D content.

3D Generation Text to 3D

139

Paper
Code

Swift Parameter-free Attention Network for Efficient Super-Resolution

1 code implementation • 21 Nov 2023 • Cheng Wan, Hongyuan Yu, Zhiqi Li, Yihang Chen, Yajun Zou, Yuqing Liu, Xuanwu Yin, Kunlong Zuo

To address this issue, we propose the Swift Parameter-free Attention Network (SPAN), a highly efficient SISR model that balances parameter count, inference speed, and image quality.

Ranked #37 on Image Super-Resolution on Set14 - 4x upscaling

Image Super-Resolution

Paper
Code

Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection

1 code implementation • NeurIPS 2023 • Linyan Huang, Zhiqi Li, Chonghao Sima, Wenhai Wang, Jingdong Wang, Yu Qiao, Hongyang Li

Current research is primarily dedicated to advancing the accuracy of camera-only 3D object detectors (apprentice) through the knowledge transferred from LiDAR- or multi-modal-based counterparts (expert).

Ranked #6 on 3D Object Detection on nuScenes Camera Only

3D Object Detection object-detection

1,082

Paper
Code

FB-BEV: BEV Representation from Forward-Backward View Transformations

1 code implementation • ICCV 2023 • Zhiqi Li, Zhiding Yu, Wenhai Wang, Anima Anandkumar, Tong Lu, Jose M. Alvarez

Currently, the two most prominent VTM paradigms are forward projection and backward projection.

551

Paper
Code

FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation

1 code implementation • 4 Jul 2023 • Zhiqi Li, Zhiding Yu, David Austin, Mingsheng Fang, Shiyi Lan, Jan Kautz, Jose M. Alvarez

This technical report summarizes the winning solution for the 3D Occupancy Prediction Challenge, which is held in conjunction with the CVPR 2023 Workshop on End-to-End Autonomous Driving and CVPR 23 Workshop on Vision-Centric Autonomous Driving Workshop.

Ranked #1 on Prediction Of Occupancy Grid Maps on Occ3D-nuScenes

Autonomous Driving Prediction Of Occupancy Grid Maps

551

Paper
Code

PointGame: Geometrically and Adaptively Masked Auto-Encoder on Point Clouds

no code implementations • 23 Mar 2023 • Yun Liu, Xuefeng Yan, Zhilei Chen, Zhiqi Li, Zeyong Wei, Mingqiang Wei

Self-supervised learning is attracting large attention in point cloud understanding.

Self-Supervised Learning

Paper
Add Code

RemoteTouch: Enhancing Immersive 3D Video Communication with Hand Touch

no code implementations • 28 Feb 2023 • Yizhong Zhang, Zhiqi Li, Sicheng Xu, Chong Li, Jiaolong Yang, Xin Tong, Baining Guo

A key challenge in emulating the remote hand touch is the realistic rendering of the participant's hand and arm as the hand touches the screen.

Paper
Add Code

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

2 code implementations • CVPR 2023 • Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state.

Ranked #1 on Instance Segmentation on COCO test-dev (AP50 metric, using extra training data)

Classification Image Classification +3

2,325

Paper
Code

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe

2 code implementations • 12 Sep 2022 • Hongyang Li, Chonghao Sima, Jifeng Dai, Wenhai Wang, Lewei Lu, Huijie Wang, Jia Zeng, Zhiqi Li, Jiazhi Yang, Hanming Deng, Hao Tian, Enze Xie, Jiangwei Xie, Li Chen, Tianyu Li, Yang Li, Yulu Gao, Xiaosong Jia, Si Liu, Jianping Shi, Dahua Lin, Yu Qiao

As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance.

Autonomous Driving

2,907

Paper
Code

Federated Learning with Label Distribution Skew via Logits Calibration

2 code implementations • 1 Sep 2022 • Jie Zhang, Zhiqi Li, Bo Li, Jianghe Xu, Shuang Wu, Shouhong Ding, Chao Wu

Extensive experiments on federated datasets and real-world datasets demonstrate that FedLC leads to a more accurate global model and much improved performance.

Federated Learning

392

Paper
Code

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

3 code implementations • 31 Mar 2022 • Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, Jifeng Dai

In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries.

Ranked #2 on Bird's-Eye View Semantic Segmentation on Lyft Level 5