no code implementations • 19 Mar 2024 • Anita Rau, Josiah Aklilu, F. Christopher Holsinger, Serena Yeung-Levy
This work proposes a novel approach to uncertainty in depth priors for NeRF supervision.
1 code implementation • 19 Mar 2024 • Elaine Sui, Xiaohan Wang, Serena Yeung-Levy
Advancements in vision-language models (VLMs) have propelled the field of computer vision, particularly in the zero-shot learning setting.
no code implementations • 15 Mar 2024 • Xiaohan Wang, Yuhui Zhang, Orr Zohar, Serena Yeung-Levy
Long-form video understanding represents a significant challenge within computer vision, demanding a model capable of reasoning over long multi-modal sequences.
Ranked #1 on Zero-Shot Video Question Answer on NExT-QA
no code implementations • 12 Mar 2024 • Juan Manuel Zambrano Chaves, Shih-Cheng Huang, Yanbo Xu, Hanwen Xu, Naoto Usuyama, Sheng Zhang, Fei Wang, Yujia Xie, Mahmoud Khademi, ZiYi Yang, Hany Awadalla, Julia Gong, Houdong Hu, Jianwei Yang, Chunyuan Li, Jianfeng Gao, Yu Gu, Cliff Wong, Mu Wei, Tristan Naumann, Muhao Chen, Matthew P. Lungren, Serena Yeung-Levy, Curtis P. Langlotz, Sheng Wang, Hoifung Poon
Frontier models such as GPT-4V still have major competency gaps in multimodal capabilities for biomedical applications.
no code implementations • 26 Feb 2024 • Zeyu Wang, Zhenzhen Weng, Serena Yeung-Levy
Conventional approaches to human mesh recovery predominantly employ a region-based strategy.
1 code implementation • 25 Jan 2024 • Sanket Rajan Gupte, Josiah Aklilu, Jeffrey J. Nirschl, Serena Yeung-Levy
Foundation vision or vision-language models are trained on large unlabeled or noisy data and learn robust representations that can achieve impressive zero- or few-shot performance on diverse tasks.
no code implementations • 22 Jan 2024 • Zhenzhen Weng, Jingyuan Liu, Hao Tan, Zhan Xu, Yang Zhou, Serena Yeung-Levy, Jimei Yang
We present Human-LRM, a diffusion-guided feed-forward model that predicts the implicit field of a human from a single image.
1 code implementation • 16 Jan 2024 • Yuhui Zhang, Elaine Sui, Serena Yeung-Levy
However, this assumption is under-explored due to the poorly understood geometry of the multi-modal contrastive space, where a modality gap exists.
1 code implementation • 5 Dec 2023 • Lisa Dunlap, Yuhui Zhang, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell, Jacob Steinhardt, Joseph E. Gonzalez, Serena Yeung-Levy
To aid in this discovery process, we explore the task of automatically describing the differences between two $\textbf{sets}$ of images, which we term Set Difference Captioning.
1 code implementation • 16 Mar 2023 • Zhenzhen Weng, Laura Bravo-Sánchez, Serena Yeung-Levy
Recent text-to-image generative models have exhibited remarkable abilities in generating high-fidelity and photo-realistic images.