1 code implementation • 6 Sep 2023 • Noriyuki Kojima, Hadar Averbuch-Elor, Yoav Artzi
Key to tasks that require reasoning about natural language in visual contexts is grounding words and phrases to image regions.
no code implementations • 29 Nov 2022 • Anya Ji, Noriyuki Kojima, Noah Rush, Alane Suhr, Wai Keen Vong, Robert D. Hawkins, Yoav Artzi
We introduce KiloGram, a resource for studying abstract visual reasoning in humans and machines.
no code implementations • 3 Nov 2022 • Anne Wu, Kianté Brantley, Noriyuki Kojima, Yoav Artzi
We present lilGym, a new benchmark for language-conditioned reinforcement learning in visual environments.
1 code implementation • 11 Oct 2022 • Yuntian Deng, Noriyuki Kojima, Alexander M. Rush
These experiments each verify the effectiveness of the diffusion process and the use of scheduled sampling to fix generation issues.
no code implementations • 10 Aug 2021 • Noriyuki Kojima, Alane Suhr, Yoav Artzi
We study continual learning for natural language instruction generation, by observing human users' instruction execution.
no code implementations • CVPR 2020 • Weifeng Chen, Shengyi Qian, David Fan, Noriyuki Kojima, Max Hamilton, Jia Deng
Single-view 3D is the task of recovering 3D properties such as depth and surface normals from a single image.
1 code implementation • ACL 2020 • Noriyuki Kojima, Hadar Averbuch-Elor, Alexander M. Rush, Yoav Artzi
Visual features are a promising signal for learning bootstrap textual models.
no code implementations • CONLL 2019 • Mahmoud Azab, Noriyuki Kojima, Jia Deng, Rada Mihalcea
We introduce a new embedding model to represent movie characters and their interactions in a dialogue by encoding in the same representation the language used by these characters as well as information about the other participants in the dialogue.
no code implementations • 26 Jul 2019 • Noriyuki Kojima, Jia Deng
In this paper we compare learning-based methods and classical methods for navigation in virtual environments.
no code implementations • NAACL 2018 • Mahmoud Azab, Mingzhe Wang, Max Smith, Noriyuki Kojima, Jia Deng, Rada Mihalcea
We propose a new model for speaker naming in movies that leverages visual, textual, and acoustic modalities in an unified optimization framework.