Search Results for author: Stefan Lee

Found 60 papers, 32 papers with code

VLSlice: Interactive Vision-and-Language Slice Discovery

1 code implementation • ICCV 2023 • Eric Slyman, Minsuk Kahng, Stefan Lee

Recent work in vision-and-language demonstrates that large-scale pretraining can learn generalizable models that are efficiently transferable to downstream tasks.

Paper
Code

Behavioral Analysis of Vision-and-Language Navigation Agents

1 code implementation • CVPR 2023 • Zijiao Yang, Arjun Majumdar, Stefan Lee

To be successful, Vision-and-Language Navigation (VLN) agents must be able to ground instructions to actions based on their surroundings.

Vision and Language Navigation

Paper
Code

Navigating to Objects Specified by Images

no code implementations • ICCV 2023 • Jacob Krantz, Theophile Gervet, Karmesh Yadav, Austin Wang, Chris Paxton, Roozbeh Mottaghi, Dhruv Batra, Jitendra Malik, Stefan Lee, Devendra Singh Chaplot

Our modular method solves sub-tasks of exploration, goal instance re-identification, goal localization, and local navigation.

Navigate Visual Reasoning

Paper
Add Code

Emergence of Maps in the Memories of Blind Navigation Agents

no code implementations • 30 Jan 2023 • Erik Wijmans, Manolis Savva, Irfan Essa, Stefan Lee, Ari S. Morcos, Dhruv Batra

A positive answer to this question would (a) explain the surprising phenomenon in recent literature of ostensibly map-free neural-networks achieving strong performance, and (b) strengthen the evidence of mapping as a fundamental mechanism for navigation by intelligent embodied agents, whether they be biological or artificial.

Inductive Bias PointGoal Navigation

Paper
Add Code

Instance-Specific Image Goal Navigation: Training Embodied Agents to Find Object Instances

no code implementations • 29 Nov 2022 • Jacob Krantz, Stefan Lee, Jitendra Malik, Dhruv Batra, Devendra Singh Chaplot

We consider the problem of embodied visual navigation given an image-goal (ImageNav) where an agent is initialized in an unfamiliar environment and tasked with navigating to a location 'described' by an image.

Visual Navigation

Paper
Add Code

Retrospectives on the Embodied AI Workshop

no code implementations • 13 Oct 2022 • Matt Deitke, Dhruv Batra, Yonatan Bisk, Tommaso Campari, Angel X. Chang, Devendra Singh Chaplot, Changan Chen, Claudia Pérez D'Arpino, Kiana Ehsani, Ali Farhadi, Li Fei-Fei, Anthony Francis, Chuang Gan, Kristen Grauman, David Hall, Winson Han, Unnat Jain, Aniruddha Kembhavi, Jacob Krantz, Stefan Lee, Chengshu Li, Sagnik Majumder, Oleksandr Maksymets, Roberto Martín-Martín, Roozbeh Mottaghi, Sonia Raychaudhuri, Mike Roberts, Silvio Savarese, Manolis Savva, Mohit Shridhar, Niko Sünderhauf, Andrew Szot, Ben Talbot, Joshua B. Tenenbaum, Jesse Thomason, Alexander Toshev, Joanne Truong, Luca Weihs, Jiajun Wu

We present a retrospective on the state of Embodied AI research.

Visual Navigation

Paper
Add Code

Iterative Vision-and-Language Navigation

no code implementations • CVPR 2023 • Jacob Krantz, Shurjo Banerjee, Wang Zhu, Jason Corso, Peter Anderson, Stefan Lee, Jesse Thomason

We present Iterative Vision-and-Language Navigation (IVLN), a paradigm for evaluating language-guided agents navigating in a persistent environment over time.

Instruction Following Vision and Language Navigation

Paper
Add Code

Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments

no code implementations • 20 Apr 2022 • Jacob Krantz, Stefan Lee

Recent work in Vision-and-Language Navigation (VLN) has presented two environmental paradigms with differing realism -- the standard VLN setting built on topological environments where navigation is abstracted away, and the VLN-CE setting where agents must navigate continuous 3D environments using low-level actions.

Navigate Vision and Language Navigation

Paper
Add Code

PROMPT: Learning Dynamic Resource Allocation Policies for Network Applications

no code implementations • 19 Jan 2022 • Drew Penney, Bin Li, Jaroslaw Sydir, Lizhong Chen, Charlie Tai, Stefan Lee, Eoin Walsh, Thomas Long

A growing number of service providers are exploring methods to improve server utilization and reduce power consumption by co-scheduling high-priority latency-critical workloads with best-effort workloads.

Scheduling

Paper
Add Code

SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation

no code implementations • NeurIPS 2021 • Abhinav Moudgil, Arjun Majumdar, Harsh Agrawal, Stefan Lee, Dhruv Batra

Natural language instructions for visual navigation often use scene descriptions (e. g., "bedroom") and object references (e. g., "green chairs") to provide a breadcrumb trail to a goal location.

Object Scene Classification +2

Paper
Add Code

Waypoint Models for Instruction-guided Navigation in Continuous Environments

1 code implementation • ICCV 2021 • Jacob Krantz, Aaron Gokaslan, Dhruv Batra, Stefan Lee, Oleksandr Maksymets

Little inquiry has explicitly addressed the role of action spaces in language-guided visual navigation -- either in terms of its effect on navigation success or the efficiency with which a robotic agent could execute the resulting trajectory.

Instruction Following Visual Navigation

215

Paper
Code

Improving Multilingual Translation by Representation and Gradient Regularization

1 code implementation • EMNLP 2021 • Yilin Yang, Akiko Eriguchi, Alexandre Muzio, Prasad Tadepalli, Stefan Lee, Hany Hassan

At the gradient level, we leverage a small amount of direct data (in thousands of sentence pairs) to regularize model gradients.

Machine Translation NMT +2

Paper
Code

Piecewise-constant Neural ODEs

1 code implementation • 11 Jun 2021 • Sam Greydanus, Stefan Lee, Alan Fern

Neural networks are a popular tool for modeling sequential data but they generally do not treat time as a continuous variable.

Paper
Code

Deep Convolution for Irregularly Sampled Temporal Point Clouds

no code implementations • 1 May 2021 • Erich Merrill, Stefan Lee, Li Fuxin, Thomas G. Dietterich, Alan Fern

We consider the problem of modeling the dynamics of continuous spatial-temporal processes represented by irregular samples through both space and time.

Starcraft Starcraft II

Paper
Add Code

Jumpy Recurrent Neural Networks

no code implementations • 1 Jan 2021 • Samuel James Greydanus, Stefan Lee, Alan Fern

This structure enables our model to jump over long time intervals while retaining the ability to produce fine-grained or continuous-time predictions when necessary.

Time Series Time Series Analysis

Paper
Add Code

THDA: Treasure Hunt Data Augmentation for Semantic Navigation

no code implementations • ICCV 2021 • Oleksandr Maksymets, Vincent Cartillier, Aaron Gokaslan, Erik Wijmans, Wojciech Galuba, Stefan Lee, Dhruv Batra

We show that this is a natural consequence of optimizing for the task metric (which in fact penalizes exploration), is enabled by powerful observation encoders, and is possible due to the finite set of training environment configurations.

Data Augmentation Navigate +2

Paper
Add Code

Where Are You? Localization from Embodied Dialog

2 code implementations • EMNLP 2020 • Meera Hahn, Jacob Krantz, Dhruv Batra, Devi Parikh, James M. Rehg, Stefan Lee, Peter Anderson

In this paper, we focus on the LED task -- providing a strong baseline model with detailed ablations characterizing both dataset biases and the importance of various modeling choices.

Navigate Visual Dialog

Paper
Code

Sim-to-Real Transfer for Vision-and-Language Navigation

1 code implementation • 7 Nov 2020 • Peter Anderson, Ayush Shrivastava, Joanne Truong, Arjun Majumdar, Devi Parikh, Dhruv Batra, Stefan Lee

We study the challenging problem of releasing a robot in a previously unseen environment, and having it follow unconstrained natural language navigation instructions.

Vision and Language Navigation

Paper
Code

Language-Conditioned Imitation Learning for Robot Manipulation Tasks

1 code implementation • NeurIPS 2020 • Simon Stepputtis, Joseph Campbell, Mariano Phielipp, Stefan Lee, Chitta Baral, Heni Ben Amor

Imitation learning is a popular approach for teaching motor skills to robots.

Imitation Learning Robot Manipulation

Paper
Code

DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs

2 code implementations • ICLR 2021 • Aayam Shrestha, Stefan Lee, Prasad Tadepalli, Alan Fern

We study an approach to offline reinforcement learning (RL) based on optimally solving finitely-represented MDPs derived from a static dataset of experience.

Offline RL reinforcement-learning +1

672

Paper
Code

On the Sub-Layer Functionalities of Transformer Decoder

no code implementations • Findings of the Association for Computational Linguistics 2020 • Yilin Yang, Longyue Wang, Shuming Shi, Prasad Tadepalli, Stefan Lee, Zhaopeng Tu

There have been significant efforts to interpret the encoder of Transformer-based encoder-decoder architectures for neural machine translation (NMT); meanwhile, the decoder remains largely unexamined despite its critical role.

Machine Translation NMT +1

Paper
Add Code

Semantic MapNet: Building Allocentric Semantic Maps and Representations from Egocentric Views

1 code implementation • 2 Oct 2020 • Vincent Cartillier, Zhile Ren, Neha Jain, Stefan Lee, Irfan Essa, Dhruv Batra

We study the task of semantic mapping - specifically, an embodied agent (a robot or an egocentric AI assistant) is given a tour of a new environment and asked to build an allocentric top-down semantic map ("what is where?")

Representation Learning

Paper
Code

Integrating Egocentric Localization for More Realistic Point-Goal Navigation Agents

no code implementations • 7 Sep 2020 • Samyak Datta, Oleksandr Maksymets, Judy Hoffman, Stefan Lee, Dhruv Batra, Devi Parikh

This enables a seamless adaption to changing dynamics (a different robot or floor type) by simply re-calibrating the visual odometry model -- circumventing the expense of re-training of the navigation policy.

Ranked #5 on Robot Navigation on Habitat 2020 Point Nav test-std

Navigate Robot Navigation +1

Paper
Add Code

Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

1 code implementation • NeurIPS 2020 • Michael Cogswell, Jiasen Lu, Rishabh Jain, Stefan Lee, Devi Parikh, Dhruv Batra

Can we develop visually grounded dialog agents that can efficiently adapt to new tasks without forgetting how to talk to people?

Visual Dialog Visual Question Answering (VQA)

Paper
Code

Extended Abstract: Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

no code implementations • ICML Workshop LaReL 2020 • Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra

Following a navigation instruction such as 'Walk down the stairs and stop near the sofa' requires an agent to ground scene elements referenced via language (e. g.'stairs') to visual content in the environment (pixels corresponding to 'stairs').

Vision and Language Navigation

Paper
Add Code

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments – Extended Abstract

no code implementations • ICML Workshop LaReL 2020 • Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee

We develop a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions.

Vision and Language Navigation

Paper
Add Code

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web

1 code implementation • ECCV 2020 • Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra

Following a navigation instruction such as 'Walk down the stairs and stop at the brown sofa' requires embodied AI agents to ground scene elements referenced via language (e. g. 'stairs') to visual content in the environment (pixels corresponding to 'stairs').

Ranked #6 on Vision and Language Navigation on VLN Challenge

Vision and Language Navigation

Paper
Code

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments

3 code implementations • ECCV 2020 • Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee

We develop a language-guided navigation task set in a continuous 3D environment where agents must execute low-level actions to follow natural language navigation directions.

Vision and Language Navigation

215

Paper
Code

Sim2Real Predictivity: Does Evaluation in Simulation Predict Real-World Performance?

3 code implementations • 13 Dec 2019 • Abhishek Kadian, Joanne Truong, Aaron Gokaslan, Alexander Clegg, Erik Wijmans, Stefan Lee, Manolis Savva, Sonia Chernova, Dhruv Batra

Second, we investigate the sim2real predictivity of Habitat-Sim for PointGoal navigation.

PointGoal Navigation Visual Navigation

1,693

Paper
Code

12-in-1: Multi-Task Vision and Language Representation Learning

5 code implementations • CVPR 2020 • Jiasen Lu, Vedanuj Goswami, Marcus Rohrbach, Devi Parikh, Stefan Lee

Much of vision-and-language research focuses on a small but diverse set of independent tasks and supporting datasets often studied in isolation; however, the visually-grounded language understanding skills required for success at these tasks overlap significantly.

Image Retrieval Question Answering +3

792

Paper
Code

Question-Conditioned Counterfactual Image Generation for VQA

no code implementations • 14 Nov 2019 • Jingjing Pan, Yash Goyal, Stefan Lee

While Visual Question Answering (VQA) models continue to push the state-of-the-art forward, they largely remain black-boxes - failing to provide insight into how or why an answer is generated.

counterfactual Image Generation +2

Paper
Add Code

DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames

8 code implementations • ICLR 2020 • Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra

We leverage this scaling to train an agent for 2. 5 Billion steps of experience (the equivalent of 80 years of human experience) -- over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs.

Ranked #1 on PointGoal Navigation on Gibson PointGoal Navigation

Autonomous Navigation Navigate +2

30,913

Paper
Code

Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation

no code implementations • IJCNLP 2019 • Arijit Ray, Karan Sikka, Ajay Divakaran, Stefan Lee, Giedrius Burachas

For instance, if a model answers "red" to "What color is the balloon?

Common Sense Reasoning Data Augmentation +4

Paper
Add Code

ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks

11 code implementations • NeurIPS 2019 • Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee

We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language.

Ranked #5 on Referring Expression Comprehension on Talk2Car

Image Retrieval Question Answering +5

792

Paper
Code

Chasing Ghosts: Instruction Following as Bayesian State Tracking

1 code implementation • NeurIPS 2019 • Peter Anderson, Ayush Shrivastava, Devi Parikh, Dhruv Batra, Stefan Lee

Our experiments show that our approach outperforms a strong LingUNet baseline when predicting the goal location on the map.

Instruction Following Vision and Language Navigation

Paper
Code

Emergence of Compositional Language with Deep Generational Transmission

1 code implementation • ICLR 2020 • Michael Cogswell, Jiasen Lu, Stefan Lee, Devi Parikh, Dhruv Batra

In this paper, we introduce these cultural evolutionary dynamics into language emergence by periodically replacing agents in a population to create a knowledge gap, implicitly inducing cultural transmission of language.

Reinforcement Learning (RL)

Paper
Code

Counterfactual Visual Explanations

1 code implementation • 16 Apr 2019 • Yash Goyal, Ziyan Wu, Jan Ernst, Dhruv Batra, Devi Parikh, Stefan Lee

In this work, we develop a technique to produce counterfactual visual explanations.

counterfactual General Classification +1

Paper
Code

Embodied Question Answering in Photorealistic Environments with Point Cloud Perception

no code implementations • CVPR 2019 • Erik Wijmans, Samyak Datta, Oleksandr Maksymets, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, Dhruv Batra

To help bridge the gap between internet vision-style problems and the goal of vision for embodied perception we instantiate a large-scale navigation task -- Embodied Question Answering [1] in photo-realistic environments (Matterport 3D).

Embodied Question Answering Question Answering

Paper
Add Code

Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering

no code implementations • ICLR 2019 • Ramakrishna Vedantam, Karan Desai, Stefan Lee, Marcus Rohrbach, Dhruv Batra, Devi Parikh

We propose a new class of probabilistic neural-symbolic models, that have symbolic functional programs as a latent, stochastic variable.

counterfactual Question Answering +1

Paper
Add Code

Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded

no code implementations • ICCV 2019 • Ramprasaath R. Selvaraju, Stefan Lee, Yilin Shen, Hongxia Jin, Shalini Ghosh, Larry Heck, Dhruv Batra, Devi Parikh

Many vision and language models suffer from poor visual grounding - often falling back on easy-to-learn language priors rather than basing their decisions on visual concepts in the image.

Image Captioning Question Answering +2

Paper
Add Code

EvalAI: Towards Better Evaluation Systems for AI Agents

3 code implementations • 10 Feb 2019 • Deshraj Yadav, Rishabh Jain, Harsh Agrawal, Prithvijit Chattopadhyay, Taranjeet Singh, Akash Jain, Shiv Baran Singh, Stefan Lee, Dhruv Batra

We introduce EvalAI, an open source platform for evaluating and comparing machine learning (ML) and artificial intelligence algorithms (AI) at scale.

Benchmarking BIG-bench Machine Learning

1,673

Paper
Code

Audio-Visual Scene-Aware Dialog

2 code implementations • 25 Jan 2019 • Huda Alamri, Vincent Cartillier, Abhishek Das, Jue Wang, Anoop Cherian, Irfan Essa, Dhruv Batra, Tim K. Marks, Chiori Hori, Peter Anderson, Stefan Lee, Devi Parikh

We introduce the task of scene-aware dialog.

Scene-Aware Dialogue

Paper
Code

nocaps: novel object captioning at scale

2 code implementations • ICCV 2019 • Harsh Agrawal, Karan Desai, YuFei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson

To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task.

Image Captioning Object +2

Paper
Code

Neural Modular Control for Embodied Question Answering

2 code implementations • 26 Oct 2018 • Abhishek Das, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra

We use imitation learning to warm-start policies at each level of the hierarchy, dramatically increasing sample efficiency, followed by reinforcement learning.

Embodied Question Answering Imitation Learning +3

1,177

Paper
Code

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

no code implementations • NeurIPS 2018 • Sainandan Ramakrishnan, Aishwarya Agrawal, Stefan Lee

Further, on standard VQA tasks, our approach shows significantly less drop in accuracy compared to existing bias-reducing VQA models.

Question Answering Visual Grounding +1

Paper
Add Code

Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition

no code implementations • 1 Oct 2018 • Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, Devi Parikh

Our question generation policy generalizes to new environments and a new pair of eyes, i. e., new visual system.

Question Generation Question-Generation

Paper
Add Code

Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance

1 code implementation • ECCV 2018 • Ramprasaath R. Selvaraju, Prithvijit Chattopadhyay, Mohamed Elhoseiny, Tilak Sharma, Dhruv Batra, Devi Parikh, Stefan Lee

Our approach, which we call Neuron Importance-AwareWeight Transfer (NIWT), learns to map domain knowledge about novel "unseen" classes onto this dictionary of learned concepts and then optimizes for network parameters that can effectively combine these concepts - essentially learning classifiers by discovering and composing learned semantic concepts in deep networks.

Generalized Zero-Shot Learning

Paper
Code

Graph R-CNN for Scene Graph Generation

3 code implementations • ECCV 2018 • Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, Devi Parikh

We propose a novel scene graph generation model called Graph R-CNN, that is both effective and efficient at detecting objects and their relations in images.

Ranked #12 on Scene Graph Generation on Visual Genome

Graph Generation Scene Graph Generation

723

Paper
Code

Learn from Your Neighbor: Learning Multi-modal Mappings from Sparse Annotations

no code implementations • ICML 2018 • Ashwin Kalyan, Stefan Lee, Anitha Kannan, Dhruv Batra

Many structured prediction problems (particularly in vision and language domains) are ambiguous, with multiple outputs being correct for an input - e. g. there are many ways of describing an image, multiple ways of translating a sentence; however, exhaustively annotating the applicability of all possible outputs is intractable due to exponentially large output spaces (e. g. all English sentences).

Multi-Label Classification Question Generation +3

Paper
Add Code

Embodied Question Answering

4 code implementations • CVPR 2018 • Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra

We present a new AI task -- Embodied Question Answering (EmbodiedQA) -- where an agent is spawned at a random location in a 3D environment and asked a question ("What color is the car?").

Embodied Question Answering Navigate +3

1,177

Paper
Code

Natural Language Does Not Emerge `Naturally' in Multi-Agent Dialog

1 code implementation • EMNLP 2017 • Satwik Kottur, Jos{\'e} Moura, Stefan Lee, Dhruv Batra

A number of recent works have proposed techniques for end-to-end learning of communication protocols among cooperative multi-agent populations, and have simultaneously found the emergence of grounded human-interpretable language in the protocols developed by the agents, learned without any human supervision!

Slot Filling

Paper
Code

Evaluating Visual Conversational Agents via Cooperative Human-AI Games

no code implementations • 17 Aug 2017 • Prithvijit Chattopadhyay, Deshraj Yadav, Viraj Prabhu, Arjun Chandrasekaran, Abhishek Das, Stefan Lee, Dhruv Batra, Devi Parikh

This suggests a mismatch between benchmarking of AI in isolation and in the context of human-AI teams.

Benchmarking

Paper
Add Code

Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog

3 code implementations • 26 Jun 2017 • Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra

106

Paper
Code

Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning

no code implementations • CVPR 2017 • Qing Sun, Stefan Lee, Dhruv Batra

We develop the first approximate inference algorithm for 1-Best (and M-Best) decoding in bidirectional neural sequence models by extending Beam Search (BS) to reason about both forward and backward time dependencies.

Image Captioning Sentence

Paper
Add Code

The Promise of Premise: Harnessing Question Premises in Visual Question Answering

1 code implementation • EMNLP 2017 • Aroma Mahendru, Viraj Prabhu, Akrit Mohapatra, Dhruv Batra, Stefan Lee

In this paper, we make a simple observation that questions about images often contain premises - objects and relationships implied by the question - and that reasoning about premises can help Visual Question Answering (VQA) models respond more intelligently to irrelevant or previously unseen questions.

Question Answering Visual Question Answering

Paper
Code

Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

7 code implementations • ICCV 2017 • Abhishek Das, Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra

Specifically, we pose a cooperative 'image guessing' game between two agents -- Qbot and Abot -- who communicate in natural language dialog so that Qbot can select an unseen image from a lineup of images.

reinforcement-learning Reinforcement Learning (RL) +2

189

Paper
Code

Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models

25 code implementations • 7 Oct 2016 • Ashwin K. Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, Dhruv Batra

We observe that our method consistently outperforms BS and previously proposed techniques for diverse decoding from neural sequence models.

Image Captioning Machine Translation +4

29,176

Paper
Code

Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles

no code implementations • NeurIPS 2016 • Stefan Lee, Senthil Purushwalkam, Michael Cogswell, Viresh Ranjan, David Crandall, Dhruv Batra

Many practical perception systems exist within larger processes that include interactions with users or additional components capable of evaluating the quality of predicted solutions.

Multiple-choice

Paper
Add Code

Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions

no code implementations • ICCV 2015 • Sven Bambach, Stefan Lee, David J. Crandall, Chen Yu

Hands appear very often in egocentric video, and their appearance and pose give important cues about what people are doing and what they are paying attention to.

Hand Detection Hand Segmentation

Paper
Add Code

Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks

no code implementations • 19 Nov 2015 • Stefan Lee, Senthil Purushwalkam, Michael Cogswell, David Crandall, Dhruv Batra

Convolutional Neural Networks have achieved state-of-the-art performance on a wide range of tasks.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.