Search Results for author: Amanmeet Garg

Found 4 papers, 2 papers with code

Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation for Grounding-Based Vision and Language Models

1 code implementation • 5 Nov 2023 • Jingru Yi, Burak Uzkent, Oana Ignat, Zili Li, Amanmeet Garg, Xiang Yu, Linda Liu

While we demonstrate our data augmentation method with MDETR framework, the proposed approach is applicable to common grounding-based vision and language tasks with other frameworks.

Data Augmentation Phrase Grounding +1

Paper
Code

Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment

no code implementations • ICCV 2023 • Sarah Ibrahimi, Xiaohang Sun, Pichao Wang, Amanmeet Garg, Ashutosh Sanan, Mohamed Omar

Nonetheless, the objective of the text-to-video retrieval task is to capture the complementary audio and video information that is pertinent to the text query rather than simply achieving better audio and video alignment.

Ranked #10 on Video Retrieval on MSR-VTT

Retrieval Text to Video Retrieval +2

Paper
Add Code

Dynamic Inference With Grounding Based Vision and Language Models

no code implementations • CVPR 2023 • Burak Uzkent, Amanmeet Garg, Wentao Zhu, Keval Doshi, Jingru Yi, Xiaolong Wang, Mohamed Omar

For example, recent image and language models with more than 200M parameters have been proposed to learn visual grounding in the pre-training step and show impressive results on downstream vision and language tasks.

Language Modelling Referring Expression +3

Paper
Add Code

PodSumm -- Podcast Audio Summarization

1 code implementation • 22 Sep 2020 • Aneesh Vartakavi, Amanmeet Garg

Listeners often rely on text descriptions of episodes provided by the podcast creators to discover new content.

Data Augmentation Specificity

1,273

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.