Search Results for author: Amita Kamath

Found 8 papers, 6 papers with code

Matryoshka Query Transformer for Large Vision-Language Models

1 code implementation29 May 2024 WenBo Hu, Zi-Yi Dou, Liunian Harold Li, Amita Kamath, Nanyun Peng, Kai-Wei Chang

This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resources?

What's "up" with vision-language models? Investigating their struggle with spatial reasoning

1 code implementation30 Oct 2023 Amita Kamath, Jack Hessel, Kai-Wei Chang

Recent vision-language (VL) models are powerful, but can they reliably distinguish "right" from "left"?

Text encoders bottleneck compositionality in contrastive vision-language models

1 code implementation24 May 2023 Amita Kamath, Jack Hessel, Kai-Wei Chang

We first curate CompPrompts, a set of increasingly compositional image captions that VL models should be able to capture (e. g., single object, to object+property, to multiple interacting objects).

Attribute Image Captioning +1

Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

1 code implementation28 Mar 2023 Adyasha Maharana, Amita Kamath, Christopher Clark, Mohit Bansal, Aniruddha Kembhavi

As general purpose vision models get increasingly effective at a wide set of tasks, it is imperative that they be consistent across the tasks they support.

Webly Supervised Concept Expansion for General Purpose Vision Models

no code implementations4 Feb 2022 Amita Kamath, Christopher Clark, Tanmay Gupta, Eric Kolve, Derek Hoiem, Aniruddha Kembhavi

This work presents an effective and inexpensive alternative: learn skills from supervised datasets, learn concepts from web image search, and leverage a key characteristic of GPVs: the ability to transfer visual knowledge across skills.

Human-Object Interaction Detection Image Retrieval +4

Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture

no code implementations CVPR 2022 Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem

To reduce the time and expertise required to develop new applications, we would like to create general purpose vision systems that can learn and perform a range of tasks without any modification to the architecture or learning process.

Question Answering Visual Question Answering

Towards General Purpose Vision Systems

2 code implementations1 Apr 2021 Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem

To reduce the time and expertise required to develop new applications, we would like to create general purpose vision systems that can learn and perform a range of tasks without any modification to the architecture or learning process.

Question Answering Visual Question Answering

Selective Question Answering under Domain Shift

2 code implementations ACL 2020 Amita Kamath, Robin Jia, Percy Liang

In this work, we propose the setting of selective question answering under domain shift, in which a QA model is tested on a mixture of in-domain and out-of-domain data, and must answer (i. e., not abstain on) as many questions as possible while maintaining high accuracy.

Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.