no code implementations • 15 Jan 2024 • Ram Ramrakhya, Aniruddha Kembhavi, Dhruv Batra, Zsolt Kira, Kuo-Hao Zeng, Luca Weihs
Datasets for image description are typically constructed by curating relevant images and asking humans to annotate the contents of the image; neither of those two steps are straightforward for objects not present in the image.
no code implementations • 5 Dec 2023 • Kiana Ehsani, Tanmay Gupta, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, Aniruddha Kembhavi
Reinforcement learning (RL) with dense rewards and imitation learning (IL) with human-generated trajectories are the most widely used approaches for training modern embodied agents.
no code implementations • 7 Nov 2023 • Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Ani Kembhavi, Ranjay Krishna
Inspired by selective attention in humans-the process through which people filter their perception based on their experiences, knowledge, and the task at hand-we introduce a parameter-efficient approach to filter visual stimuli for embodied AI.
no code implementations • 24 Apr 2023 • Kuo-Hao Zeng, Luca Weihs, Roozbeh Mottaghi, Ali Farhadi
A common assumption when training embodied agents is that the impact of taking an action is stable; for instance, executing the "move ahead" action will always move the agent forward by a fixed distance, perhaps with some small amount of actuator-induced noise.
1 code implementation • CVPR 2021 • Kuo-Hao Zeng, Luca Weihs, Ali Farhadi, Roozbeh Mottaghi
In this paper, we study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals.
1 code implementation • 28 Aug 2020 • Luca Weihs, Jordi Salvador, Klemen Kotar, Unnat Jain, Kuo-Hao Zeng, Roozbeh Mottaghi, Aniruddha Kembhavi
The domain of Embodied AI, in which agents learn to complete tasks through interaction with their environment from egocentric observations, has experienced substantial growth with the advent of deep reinforcement learning and increased interest from the computer vision, NLP, and robotics communities.
no code implementations • 2 Mar 2020 • Kuo-Hao Zeng, Mohammad Shoeybi, Ming-Yu Liu
The style encoder extracts a style code from the reference example, and the text decoder generates texts based on the style code and the context.
1 code implementation • CVPR 2020 • Kuo-Hao Zeng, Roozbeh Mottaghi, Luca Weihs, Ali Farhadi
In this paper we address the problem of visual reaction: the task of interacting with dynamic environments where the changes in the environment are not necessarily caused by the agent itself.
no code implementations • 12 Mar 2018 • Tsun-Hsuan Wang, Hung-Jui Huang, Juan-Ting Lin, Chan-Wei Hu, Kuo-Hao Zeng, Min Sun
Given a visual input, the task of the O-CNN is not to retrieve the matched place exemplar, but to retrieve the closest place exemplar and estimate the relative distance between the input and the closest place.
1 code implementation • 23 Nov 2017 • Shih-Han Chou, Yi-Chun Chen, Kuo-Hao Zeng, Hou-Ning Hu, Jianlong Fu, Min Sun
The negative log reconstruction loss of the reverse sentence (referred to as "irrelevant loss") is jointly minimized to encourage the reverse sentence to be different from the given sentence.
no code implementations • ICCV 2017 • Kuo-Hao Zeng, William B. Shen, De-An Huang, Min Sun, Juan Carlos Niebles
This allows us to apply IRL at scale and directly imitate the dynamics in high-dimensional continuous visual sequences from the raw pixel values.
no code implementations • CVPR 2017 • Kuo-Hao Zeng, Shih-Han Chou, Fu-Hsiang Chan, Juan Carlos Niebles, Min Sun
For survival, a living agent must have the ability to assess risk (1) by temporally anticipating accidents before they occur, and (2) by spatially localizing risky regions in the environment to move away from threats.
no code implementations • 12 Nov 2016 • Kuo-Hao Zeng, Tseng-Hung Chen, Ching-Yao Chuang, Yuan-Hong Liao, Juan Carlos Niebles, Min Sun
Then, a large number of candidate QA pairs are automatically generated from descriptions rather than manually annotated.
no code implementations • 25 Aug 2016 • Kuo-Hao Zeng, Tseng-Hung Chen, Juan Carlos Niebles, Min Sun
Finally, our sentence augmentation method also outperforms the baselines on the M-VAD dataset.