Search Results for author: Amita Kamath

Found 8 papers, 6 papers with code

Matryoshka Query Transformer for Large Vision-Language Models

1 code implementation • 29 May 2024 • WenBo Hu, Zi-Yi Dou, Liunian Harold Li, Amita Kamath, Nanyun Peng, Kai-Wei Chang

This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resources?

Paper
Code

What's "up" with vision-language models? Investigating their struggle with spatial reasoning

1 code implementation • 30 Oct 2023 • Amita Kamath, Jack Hessel, Kai-Wei Chang

Recent vision-language (VL) models are powerful, but can they reliably distinguish "right" from "left"?

Paper
Code

Text encoders bottleneck compositionality in contrastive vision-language models

1 code implementation • 24 May 2023 • Amita Kamath, Jack Hessel, Kai-Wei Chang

We first curate CompPrompts, a set of increasingly compositional image captions that VL models should be able to capture (e. g., single object, to object+property, to multiple interacting objects).

Attribute Image Captioning +1

Paper
Code

Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

1 code implementation • 28 Mar 2023 • Adyasha Maharana, Amita Kamath, Christopher Clark, Mohit Bansal, Aniruddha Kembhavi

As general purpose vision models get increasingly effective at a wide set of tasks, it is imperative that they be consistent across the tasks they support.

Paper
Code

Webly Supervised Concept Expansion for General Purpose Vision Models

no code implementations • 4 Feb 2022 • Amita Kamath, Christopher Clark, Tanmay Gupta, Eric Kolve, Derek Hoiem, Aniruddha Kembhavi

This work presents an effective and inexpensive alternative: learn skills from supervised datasets, learn concepts from web image search, and leverage a key characteristic of GPVs: the ability to transfer visual knowledge across skills.

Ranked #2 on Visual Question Answering (VQA) on GRIT

Human-Object Interaction Detection Image Retrieval +4

Paper
Add Code

Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture

no code implementations • CVPR 2022 • Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem

To reduce the time and expertise required to develop new applications, we would like to create general purpose vision systems that can learn and perform a range of tasks without any modification to the architecture or learning process.

Question Answering Visual Question Answering

Paper
Add Code

Towards General Purpose Vision Systems

2 code implementations • 1 Apr 2021 • Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem

Question Answering Visual Question Answering

298

Paper
Code

Selective Question Answering under Domain Shift

2 code implementations • ACL 2020 • Amita Kamath, Robin Jia, Percy Liang

In this work, we propose the setting of selective question answering under domain shift, in which a QA model is tested on a mixture of in-domain and out-of-domain data, and must answer (i. e., not abstain on) as many questions as possible while maintaining high accuracy.

Question Answering

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.