Search Results for author: Tanmay Gupta

Found 17 papers, 10 papers with code

m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks

1 code implementation • 17 Mar 2024 • Zixian Ma, Weikai Huang, Jieyu Zhang, Tanmay Gupta, Ranjay Krishna

With m&m's, we evaluate 6 popular LLMs with 2 planning strategies (multi-step vs. step-by-step planning), 2 plan formats (JSON vs. code), and 3 types of feedback (parsing/verification/execution).

Paper
Code

Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning

no code implementations • 23 Feb 2024 • Tejas Srinivasan, Jack Hessel, Tanmay Gupta, Bill Yuchen Lin, Yejin Choi, Jesse Thomason, Khyathi Raghavi Chandu

Prior work on selective prediction minimizes incorrect predictions from vision-language models (VLMs) by allowing them to abstain from answering when uncertain.

Paper
Add Code

Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World

no code implementations • 5 Dec 2023 • Kiana Ehsani, Tanmay Gupta, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, Aniruddha Kembhavi

Reinforcement learning (RL) with dense rewards and imitation learning (IL) with human-generated trajectories are the most widely used approaches for training modern embodied agents.

Benchmarking Image Augmentation +3

Paper
Add Code

Visual Programming: Compositional visual reasoning without training

1 code implementation • CVPR 2023 • Tanmay Gupta, Aniruddha Kembhavi

We present VISPROG, a neuro-symbolic approach to solving complex and compositional visual tasks given natural language instructions.

In-Context Learning Question Answering +2

649

Paper
Code

GRIT: General Robust Image Task Benchmark

1 code implementation • 28 Apr 2022 • Tanmay Gupta, Ryan Marten, Aniruddha Kembhavi, Derek Hoiem

Computer vision models excel at making predictions when the test distribution closely resembles the training distribution.

Instance Segmentation Keypoint Detection +7

Paper
Code

Webly Supervised Concept Expansion for General Purpose Vision Models

no code implementations • 4 Feb 2022 • Amita Kamath, Christopher Clark, Tanmay Gupta, Eric Kolve, Derek Hoiem, Aniruddha Kembhavi

This work presents an effective and inexpensive alternative: learn skills from supervised datasets, learn concepts from web image search, and leverage a key characteristic of GPVs: the ability to transfer visual knowledge across skills.

Ranked #2 on Visual Question Answering (VQA) on GRIT

Human-Object Interaction Detection Image Retrieval +4

Paper
Add Code

Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture

no code implementations • CVPR 2022 • Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem

To reduce the time and expertise required to develop new applications, we would like to create general purpose vision systems that can learn and perform a range of tasks without any modification to the architecture or learning process.

Question Answering Visual Question Answering

Paper
Add Code

Visual Semantic Role Labeling for Video Understanding

1 code implementation • CVPR 2021 • Arka Sadhu, Tanmay Gupta, Mark Yatskar, Ram Nevatia, Aniruddha Kembhavi

We propose a new framework for understanding and representing related salient events in a video using visual semantic role labeling.

Semantic Role Labeling Video Recognition +1

Paper
Code

Towards General Purpose Vision Systems

2 code implementations • 1 Apr 2021 • Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem

Question Answering Visual Question Answering

294

Paper
Code

Learning Curves for Analysis of Deep Networks

1 code implementation • 21 Oct 2020 • Derek Hoiem, Tanmay Gupta, Zhizhong Li, Michal M. Shlapentokh-Rothman

Learning curves model a classifier's test error as a function of the number of training samples.

Data Augmentation Image Classification

Paper
Code

Contrastive Learning for Weakly Supervised Phrase Grounding

1 code implementation • ECCV 2020 • Tanmay Gupta, Arash Vahdat, Gal Chechik, Xiaodong Yang, Jan Kautz, Derek Hoiem

Given pairs of images and captions, we maximize compatibility of the attention-weighted regions and the words in the corresponding caption, compared to non-corresponding pairs of images and captions.

Contrastive Learning Language Modelling +1

Paper
Code

ViCo: Word Embeddings from Visual Co-occurrences

1 code implementation • ICCV 2019 • Tanmay Gupta, Alexander Schwing, Derek Hoiem

Through unsupervised clustering, supervised partitioning, and a zero-shot-like generalization analysis we show that our word embeddings complement text-only embeddings like GloVe by better representing similarities and differences between visual concepts that are difficult to obtain from text corpora alone.

Attribute Clustering +1

Paper
Code

No-Frills Human-Object Interaction Detection: Factorization, Layout Encodings, and Training Techniques

3 code implementations • ICCV 2019 • Tanmay Gupta, Alexander Schwing, Derek Hoiem

We show that for human-object interaction detection a relatively simple factorized model with appearance and layout encodings constructed from pre-trained object detectors outperforms more sophisticated approaches.

Human-Object Interaction Detection Object

Paper
Code

Imagine This! Scripts to Compositions to Videos

5 code implementations • ECCV 2018 • Tanmay Gupta, Dustin Schwenk, Ali Farhadi, Derek Hoiem, Aniruddha Kembhavi

Imagining a scene described in natural language with realistic layout and appearance of entities is the ultimate test of spatial, visual, and semantic world knowledge.

Retrieval World Knowledge

Paper
Code

Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks

no code implementations • ICCV 2017 • Tanmay Gupta, Kevin Shih, Saurabh Singh, Derek Hoiem

In this paper, we investigate a vision-language embedding as a core representation and show that it leads to better cross-task transfer than standard multi-task learning.

Multi-Task Learning Question Answering +1

Paper
Add Code

3DFS: Deformable Dense Depth Fusion and Segmentation for Object Reconstruction from a Handheld Camera

no code implementations • 15 Jun 2016 • Tanmay Gupta, Daeyun Shin, Naren Sivagnanadasan, Derek Hoiem

The resulting depth maps are then fused using a proposed implicit surface function that is robust to estimation error, producing a smooth surface reconstruction of the entire scene.

3D Reconstruction Depth Estimation +4

Paper
Add Code

Completing 3D Object Shape From One Depth Image

no code implementations • CVPR 2015 • Jason Rock, Tanmay Gupta, Justin Thorsen, JunYoung Gwak, Daeyun Shin, Derek Hoiem

Our goal is to recover a complete 3D model from a depth image of an object.

Object

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.