1 code implementation • 26 Jan 2023 • Scott Geng, Revant Teotia, Purva Tendulkar, Sachit Menon, Carl Vondrick
We introduce a video framework for modeling the association between verbal and non-verbal communication during dyadic conversation.
1 code implementation • CVPR 2023 • Chengzhi Mao, Revant Teotia, Amrutha Sundar, Sachit Menon, Junfeng Yang, Xin Wang, Carl Vondrick
We propose a ``doubly right'' object recognition benchmark, where the metric requires the model to simultaneously produce both the right labels as well as the right rationales.
no code implementations • 16 Oct 2022 • Prajwal Gatti, Abhirama Subramanyam Penamakuri, Revant Teotia, Anand Mishra, Shubhashis Sengupta, Roshni Ramnani
To enable both commonsense and factual reasoning in the image search, we present a unified framework, namely Knowledge Retrieval-Augmented Multimodal Transformer (KRAMT), that treats the named visual entities in an image as a gateway to encyclopedic knowledge and leverages them along with natural language query to ground relevant knowledge.
1 code implementation • ICCV 2021 • Revant Teotia, Vaibhav Mishra, Mayank Maheshwari, Anand Mishra
In this paper, given a small bag of images, each containing a common but latent predicate, we are interested in localizing visual subject-object pairs connected via the common predicate in each of the images.