Search Results for author: Dongzhi Jiang

Found 4 papers, 3 papers with code

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

1 code implementation19 Apr 2024 Zhuofan Zong, Bingqi Ma, Dazhong Shen, Guanglu Song, Hao Shao, Dongzhi Jiang, Hongsheng Li, Yu Liu

Although some large-scale pretrained vision encoders such as vision encoders in CLIP and DINOv2 have brought promising performance, we found that there is still no single vision encoder that can dominate various image content understanding, e. g., the CLIP vision encoder leads to outstanding results on general image understanding but poor performance on document or chart content.

Language Modelling Large Language Model

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

2 code implementations4 Apr 2024 Dongzhi Jiang, Guanglu Song, Xiaoshi Wu, Renrui Zhang, Dazhong Shen, Zhuofan Zong, Yu Liu, Hongsheng Li

We further attribute this phenomenon to the diffusion model's insufficient condition utilization, which is caused by its training paradigm.

Attribute Image Captioning +1

Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction

1 code implementation ICCV 2023 Zhuofan Zong, Dongzhi Jiang, Guanglu Song, Zeyue Xue, Jingyong Su, Hongsheng Li, Yu Liu

The HoP approach is straightforward: given the current timestamp t, we generate a pseudo Bird's-Eye View (BEV) feature of timestamp t-k from its adjacent frames and utilize this feature to predict the object set at timestamp t-k. Our approach is motivated by the observation that enforcing the detector to capture both the spatial location and temporal motion of objects occurring at historical timestamps can lead to more accurate BEV feature learning.

3D Object Detection Object

Cannot find the paper you are looking for? You can Submit a new open access paper.