1 code implementation • 2 May 2024 • Yuan Tang, Xu Han, Xianzhi Li, Qiao Yu, Yixue Hao, Long Hu, Min Chen
Notably, MiniGPT-3D gains an 8. 12 increase on GPT-4 evaluation score for the challenging object captioning task compared to ShapeLLM-13B, while the latter costs 160 total GPU-hours on 8 A800.
Ranked #1 on 3D Object Captioning on Objaverse
no code implementations • 5 Dec 2023 • Qiao Yu, Wengui Zhang, Jorge Cardoso, Odej Kao
In this paper, we present a comprehensive study on the correlation between CEs and UEs, specifically emphasizing the importance of spatio-temporal error bit information.
1 code implementation • IEEE Transactions on Multimedia 2023 • Yuan Tang, Xianzhi Li, Jinfeng Xu, Qiao Yu, Long Hu, Yixue Hao, Min Chen
In our work, we present Point-LGMask, a novel method to embed both local and global contexts with multi-ratio masking, which is quite effective for self-supervised feature learning of point clouds but is unfortunately ignored by existing pre-training works.
Ranked #3 on Few-Shot 3D Point Cloud Classification on ModelNet40 5-way (10-shot) (using extra training data)
1 code implementation • 9 Mar 2023 • Peng Gao, Renrui Zhang, Rongyao Fang, Ziyi Lin, Hongyang Li, Hongsheng Li, Qiao Yu
To alleviate this, previous methods simply replace the pixel reconstruction targets of 75% masked tokens by encoded features from pre-trained image-image (DINO) or image-language (CLIP) contrastive learning.
1 code implementation • 24 Nov 2022 • Jinfeng Xu, Xianzhi Li, Yuan Tang, Qiao Yu, Yixue Hao, Long Hu, Min Chen
In our work, we present CasFusionNet, a novel cascaded network for point cloud semantic scene completion by dense feature fusion.
3 code implementations • 31 Mar 2022 • Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, Jifeng Dai
In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries.
no code implementations • 7 May 2021 • Haoming Cai, Jingwen He, Qiao Yu, Chao Dong
The base networks comprise a generator and a discriminator.