1 code implementation • 7 Aug 2023 • Yinjie Zhao, Lichen Zhao, Qian Yu, Jing Zhang, Lu Sheng, Dong Xu
The first is a Distortion Mapping Module, which guides the model to pre-adapt to distorted features globally.
1 code implementation • CVPR 2023 • Ziqin Wang, Bowen Cheng, Lichen Zhao, Dong Xu, Yang Tang, Lu Sheng
Since 2D images provide rich semantics and scene graphs are in nature coped with languages, in this study, we propose Visual-Linguistic Semantics Assisted Training (VL-SAT) scheme that can significantly empower 3DSSG prediction models with discrimination about long-tailed and ambiguous semantic relations.
Ranked #1 on 3d scene graph generation on 3DSSG (using extra training data)
1 code implementation • 24 Sep 2022 • Lichen Zhao, Daigang Cai, Jing Zhang, Lu Sheng, Dong Xu, Rui Zheng, Yinjie Zhao, Lipeng Wang, Xibo Fan
We also propose a new 3D VQA framework to effectively predict the completely visually grounded and explainable answer.
1 code implementation • 11 Mar 2022 • Yufeng Cui, Lichen Zhao, Feng Liang, Yangguang Li, Jing Shao
This is because researchers do not choose consistent training recipes and even use different data, hampering the fair comparison between different methods.
no code implementations • CVPR 2022 • Daigang Cai, Lichen Zhao, Jing Zhang, Lu Sheng, Dong Xu
Observing that the 3D captioning task and the 3D grounding task contain both shared and complementary information in nature, in this work, we propose a unified framework to jointly solve these two distinct but closely related tasks in a synergistic fashion, which consists of both shared task-agnostic modules and lightweight task-specific modules.
3 code implementations • ICLR 2022 • Yangguang Li, Feng Liang, Lichen Zhao, Yufeng Cui, Wanli Ouyang, Jing Shao, Fengwei Yu, Junjie Yan
Recently, large-scale Contrastive Language-Image Pre-training (CLIP) has attracted unprecedented attention for its impressive zero-shot recognition ability and excellent transferability to downstream tasks.
no code implementations • ICCV 2021 • Lichen Zhao, Daigang Cai, Lu Sheng, Dong Xu
Visual grounding on 3D point clouds is an emerging vision and language task that benefits various applications in understanding the 3D visual world.