Search Results for author: Samyak Datta

Found 7 papers, 2 papers with code

DISGO: Automatic End-to-End Evaluation for Scene Text OCR

no code implementations • 25 Aug 2023 • Mei-Yuh Hwang, Yangyang Shi, Ankit Ramchandani, Guan Pang, Praveen Krishnan, Lucas Kabela, Frank Seide, Samyak Datta, Jun Liu

This paper discusses the challenges of optical character recognition (OCR) on natural scenes, which is harder than OCR on documents due to the wild content and various image backgrounds.

Machine Translation Optical Character Recognition +2

Paper
Add Code

Episodic Memory Question Answering

no code implementations • CVPR 2022 • Samyak Datta, Sameer Dharur, Vincent Cartillier, Ruta Desai, Mukul Khanna, Dhruv Batra, Devi Parikh

Towards that end, we introduce (1) a new task - Episodic Memory Question Answering (EMQA) wherein an egocentric AI assistant is provided with a video sequence (the tour) and a question as an input and is asked to localize its answer to the question within the tour, (2) a dataset of grounded questions designed to probe the agent's spatio-temporal understanding of the tour, and (3) a model for the task that encodes the scene as an allocentric, top-down semantic feature map and grounds the question into the map to localize the answer.

Question Answering

Paper
Add Code

Integrating Egocentric Localization for More Realistic Point-Goal Navigation Agents

no code implementations • 7 Sep 2020 • Samyak Datta, Oleksandr Maksymets, Judy Hoffman, Stefan Lee, Dhruv Batra, Devi Parikh

This enables a seamless adaption to changing dynamics (a different robot or floor type) by simply re-calibrating the visual odometry model -- circumventing the expense of re-training of the navigation policy.

Ranked #5 on Robot Navigation on Habitat 2020 Point Nav test-std

Navigate Robot Navigation +1

Paper
Add Code

Embodied Question Answering in Photorealistic Environments with Point Cloud Perception

no code implementations • CVPR 2019 • Erik Wijmans, Samyak Datta, Oleksandr Maksymets, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, Dhruv Batra

To help bridge the gap between internet vision-style problems and the goal of vision for embodied perception we instantiate a large-scale navigation task -- Embodied Question Answering [1] in photo-realistic environments (Matterport 3D).

Embodied Question Answering Question Answering

Paper
Add Code

Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment

no code implementations • ICCV 2019 • Samyak Datta, Karan Sikka, Anirban Roy, Karuna Ahuja, Devi Parikh, Ajay Divakaran

We propose a novel end-to-end model that uses caption-to-image retrieval as a `downstream' task to guide the process of phrase localization.

Image Retrieval Phrase Grounding +2

Paper
Add Code

Unsupervised Learning of Face Representations

1 code implementation • 3 Mar 2018 • Samyak Datta, Gaurav Sharma, C. V. Jawahar

Although faces extracted from videos have a lower spatial resolution than those which are available as part of standard supervised face datasets such as LFW and CASIA-WebFace, the former represent a much more realistic setting, e. g. in surveillance scenarios where most of the faces detected are very small.

Paper
Code

Embodied Question Answering

4 code implementations • CVPR 2018 • Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra

We present a new AI task -- Embodied Question Answering (EmbodiedQA) -- where an agent is spawned at a random location in a 3D environment and asked a question ("What color is the car?").

Embodied Question Answering Navigate +3

1,178

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.