Search Results for author: Kuo-Hao Zeng

Found 14 papers, 4 papers with code

Seeing the Unseen: Visual Common Sense for Semantic Placement

no code implementations • 15 Jan 2024 • Ram Ramrakhya, Aniruddha Kembhavi, Dhruv Batra, Zsolt Kira, Kuo-Hao Zeng, Luca Weihs

Datasets for image description are typically constructed by curating relevant images and asking humans to annotate the contents of the image; neither of those two steps are straightforward for objects not present in the image.

Common Sense Reasoning Object

Paper
Add Code

Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World

no code implementations • 5 Dec 2023 • Kiana Ehsani, Tanmay Gupta, Rose Hendrix, Jordi Salvador, Luca Weihs, Kuo-Hao Zeng, Kunal Pratap Singh, Yejin Kim, Winson Han, Alvaro Herrasti, Ranjay Krishna, Dustin Schwenk, Eli VanderBilt, Aniruddha Kembhavi

Reinforcement learning (RL) with dense rewards and imitation learning (IL) with human-generated trajectories are the most widely used approaches for training modern embodied agents.

Benchmarking Image Augmentation +3

Paper
Add Code

Selective Visual Representations Improve Convergence and Generalization for Embodied AI

no code implementations • 7 Nov 2023 • Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Ani Kembhavi, Ranjay Krishna

Inspired by selective attention in humans-the process through which people filter their perception based on their experiences, knowledge, and the task at hand-we introduce a parameter-efficient approach to filter visual stimuli for embodied AI.

Object Object Recognition

Paper
Add Code

Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics

no code implementations • 24 Apr 2023 • Kuo-Hao Zeng, Luca Weihs, Roozbeh Mottaghi, Ali Farhadi

A common assumption when training embodied agents is that the impact of taking an action is stable; for instance, executing the "move ahead" action will always move the agent forward by a fixed distance, perhaps with some small amount of actuator-induced noise.

Visual Navigation

Paper
Add Code

Pushing it out of the Way: Interactive Visual Navigation

1 code implementation • CVPR 2021 • Kuo-Hao Zeng, Luca Weihs, Ali Farhadi, Roozbeh Mottaghi

In this paper, we study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals.

Navigate Visual Navigation

Paper
Code

AllenAct: A Framework for Embodied AI Research

1 code implementation • 28 Aug 2020 • Luca Weihs, Jordi Salvador, Klemen Kotar, Unnat Jain, Kuo-Hao Zeng, Roozbeh Mottaghi, Aniruddha Kembhavi

The domain of Embodied AI, in which agents learn to complete tasks through interaction with their environment from egocentric observations, has experienced substantial growth with the advent of deep reinforcement learning and increased interest from the computer vision, NLP, and robotics communities.

Embodied Question Answering Instruction Following +1

295

Paper
Code

Style Example-Guided Text Generation using Generative Adversarial Transformers

no code implementations • 2 Mar 2020 • Kuo-Hao Zeng, Mohammad Shoeybi, Ming-Yu Liu

The style encoder extracts a style code from the reference example, and the text decoder generates texts based on the style code and the context.

Sentence Text Generation

Paper
Add Code

Visual Reaction: Learning to Play Catch with Your Drone

1 code implementation • CVPR 2020 • Kuo-Hao Zeng, Roozbeh Mottaghi, Luca Weihs, Ali Farhadi

In this paper we address the problem of visual reaction: the task of interacting with dynamic environments where the changes in the environment are not necessarily caused by the agent itself.

Paper
Code

Omnidirectional CNN for Visual Place Recognition and Navigation

no code implementations • 12 Mar 2018 • Tsun-Hsuan Wang, Hung-Jui Huang, Juan-Ting Lin, Chan-Wei Hu, Kuo-Hao Zeng, Min Sun

Given a visual input, the task of the O-CNN is not to retrieve the matched place exemplar, but to retrieve the closest place exemplar and estimate the relative distance between the input and the closest place.

Navigate Visual Place Recognition

Paper
Add Code

Self-view Grounding Given a Narrated 360° Video

1 code implementation • 23 Nov 2017 • Shih-Han Chou, Yi-Chun Chen, Kuo-Hao Zeng, Hou-Ning Hu, Jianlong Fu, Min Sun

The negative log reconstruction loss of the reverse sentence (referred to as "irrelevant loss") is jointly minimized to encourage the reverse sentence to be different from the given sentence.

Sentence Visual Grounding

Paper
Code

Visual Forecasting by Imitating Dynamics in Natural Sequences

no code implementations • ICCV 2017 • Kuo-Hao Zeng, William B. Shen, De-An Huang, Min Sun, Juan Carlos Niebles

This allows us to apply IRL at scale and directly imitate the dynamics in high-dimensional continuous visual sequences from the raw pixel values.

Action Anticipation

Paper
Add Code

Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization

no code implementations • CVPR 2017 • Kuo-Hao Zeng, Shih-Han Chou, Fu-Hsiang Chan, Juan Carlos Niebles, Min Sun

For survival, a living agent must have the ability to assess risk (1) by temporally anticipating accidents before they occur, and (2) by spatially localizing risky regions in the environment to move away from threats.

Accident Anticipation

Paper
Add Code

Leveraging Video Descriptions to Learn Video Question Answering

no code implementations • 12 Nov 2016 • Kuo-Hao Zeng, Tseng-Hung Chen, Ching-Yao Chuang, Yuan-Hong Liao, Juan Carlos Niebles, Min Sun

Then, a large number of candidate QA pairs are automatically generated from descriptions rather than manually annotated.

Question Answering Video Question Answering +1

Paper
Add Code

Title Generation for User Generated Videos

no code implementations • 25 Aug 2016 • Kuo-Hao Zeng, Tseng-Hung Chen, Juan Carlos Niebles, Min Sun

Finally, our sentence augmentation method also outperforms the baselines on the M-VAD dataset.

Sentence Video Captioning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.