1 code implementation • ICCV 2023 • Shuhong Zheng, Zhipeng Bao, Martial Hebert, Yu-Xiong Wang
To tackle the MTVS problem, we propose MuvieNeRF, a framework that incorporates both multi-task and cross-view knowledge to simultaneously synthesize multiple scene properties.
5 code implementations • NeurIPS 2023 • Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, Chuang Gan
Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs.
no code implementations • 9 Jun 2022 • Mingtong Zhang, Shuhong Zheng, Zhipeng Bao, Martial Hebert, Yu-Xiong Wang
Comprehensive 3D scene understanding, both geometrically and semantically, is important for real-world applications such as robot perception.