Search Results for author: Tanmay Gupta

Found 17 papers, 10 papers with code

m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks

1 code implementation17 Mar 2024 Zixian Ma, Weikai Huang, Jieyu Zhang, Tanmay Gupta, Ranjay Krishna

With m&m's, we evaluate 6 popular LLMs with 2 planning strategies (multi-step vs. step-by-step planning), 2 plan formats (JSON vs. code), and 3 types of feedback (parsing/verification/execution).

4k

Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning

no code implementations23 Feb 2024 Tejas Srinivasan, Jack Hessel, Tanmay Gupta, Bill Yuchen Lin, Yejin Choi, Jesse Thomason, Khyathi Raghavi Chandu

Prior work on selective prediction minimizes incorrect predictions from vision-language models (VLMs) by allowing them to abstain from answering when uncertain.

Visual Programming: Compositional visual reasoning without training

1 code implementation CVPR 2023 Tanmay Gupta, Aniruddha Kembhavi

We present VISPROG, a neuro-symbolic approach to solving complex and compositional visual tasks given natural language instructions.

In-Context Learning Question Answering +2

GRIT: General Robust Image Task Benchmark

1 code implementation28 Apr 2022 Tanmay Gupta, Ryan Marten, Aniruddha Kembhavi, Derek Hoiem

Computer vision models excel at making predictions when the test distribution closely resembles the training distribution.

Instance Segmentation Keypoint Detection +7

Webly Supervised Concept Expansion for General Purpose Vision Models

no code implementations4 Feb 2022 Amita Kamath, Christopher Clark, Tanmay Gupta, Eric Kolve, Derek Hoiem, Aniruddha Kembhavi

This work presents an effective and inexpensive alternative: learn skills from supervised datasets, learn concepts from web image search, and leverage a key characteristic of GPVs: the ability to transfer visual knowledge across skills.

Human-Object Interaction Detection Image Retrieval +4

Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture

no code implementations CVPR 2022 Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem

To reduce the time and expertise required to develop new applications, we would like to create general purpose vision systems that can learn and perform a range of tasks without any modification to the architecture or learning process.

Question Answering Visual Question Answering

Visual Semantic Role Labeling for Video Understanding

1 code implementation CVPR 2021 Arka Sadhu, Tanmay Gupta, Mark Yatskar, Ram Nevatia, Aniruddha Kembhavi

We propose a new framework for understanding and representing related salient events in a video using visual semantic role labeling.

Semantic Role Labeling Video Recognition +1

Towards General Purpose Vision Systems

2 code implementations1 Apr 2021 Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem

To reduce the time and expertise required to develop new applications, we would like to create general purpose vision systems that can learn and perform a range of tasks without any modification to the architecture or learning process.

Question Answering Visual Question Answering

Learning Curves for Analysis of Deep Networks

1 code implementation21 Oct 2020 Derek Hoiem, Tanmay Gupta, Zhizhong Li, Michal M. Shlapentokh-Rothman

Learning curves model a classifier's test error as a function of the number of training samples.

Data Augmentation Image Classification

Contrastive Learning for Weakly Supervised Phrase Grounding

1 code implementation ECCV 2020 Tanmay Gupta, Arash Vahdat, Gal Chechik, Xiaodong Yang, Jan Kautz, Derek Hoiem

Given pairs of images and captions, we maximize compatibility of the attention-weighted regions and the words in the corresponding caption, compared to non-corresponding pairs of images and captions.

Contrastive Learning Language Modelling +1

ViCo: Word Embeddings from Visual Co-occurrences

1 code implementation ICCV 2019 Tanmay Gupta, Alexander Schwing, Derek Hoiem

Through unsupervised clustering, supervised partitioning, and a zero-shot-like generalization analysis we show that our word embeddings complement text-only embeddings like GloVe by better representing similarities and differences between visual concepts that are difficult to obtain from text corpora alone.

Attribute Clustering +1

No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques

3 code implementations ICCV 2019 Tanmay Gupta, Alexander Schwing, Derek Hoiem

We show that for human-object interaction detection a relatively simple factorized model with appearance and layout encodings constructed from pre-trained object detectors outperforms more sophisticated approaches.

Human-Object Interaction Detection Object

Imagine This! Scripts to Compositions to Videos

5 code implementations ECCV 2018 Tanmay Gupta, Dustin Schwenk, Ali Farhadi, Derek Hoiem, Aniruddha Kembhavi

Imagining a scene described in natural language with realistic layout and appearance of entities is the ultimate test of spatial, visual, and semantic world knowledge.

Retrieval World Knowledge

Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks

no code implementations ICCV 2017 Tanmay Gupta, Kevin Shih, Saurabh Singh, Derek Hoiem

In this paper, we investigate a vision-language embedding as a core representation and show that it leads to better cross-task transfer than standard multi-task learning.

Multi-Task Learning Question Answering +1

3DFS: Deformable Dense Depth Fusion and Segmentation for Object Reconstruction from a Handheld Camera

no code implementations15 Jun 2016 Tanmay Gupta, Daeyun Shin, Naren Sivagnanadasan, Derek Hoiem

The resulting depth maps are then fused using a proposed implicit surface function that is robust to estimation error, producing a smooth surface reconstruction of the entire scene.

3D Reconstruction Depth Estimation +4

Cannot find the paper you are looking for? You can Submit a new open access paper.