no code implementations • NeurIPS 2023 • Di Qi, Tong Yang, Xiangyu Zhang
We hope our approach can provide preliminary understanding of the physical world and help ease future research in 3D object-centric representation learning.
no code implementations • 22 Jan 2020 • Di Qi, Lin Su, Jia Song, Edward Cui, Taroon Bharti, Arun Sacheti
In this paper, we introduce a new vision-language pre-trained model -- ImageBERT -- for image-text joint embedding.
Ranked #15 on Zero-Shot Cross-Modal Retrieval on COCO 2014