Search Results for author: Kuo-Hao Zeng

Found 14 papers, 4 papers with code

Seeing the Unseen: Visual Common Sense for Semantic Placement

no code implementations15 Jan 2024 Ram Ramrakhya, Aniruddha Kembhavi, Dhruv Batra, Zsolt Kira, Kuo-Hao Zeng, Luca Weihs

Datasets for image description are typically constructed by curating relevant images and asking humans to annotate the contents of the image; neither of those two steps are straightforward for objects not present in the image.

Common Sense Reasoning Object

Selective Visual Representations Improve Convergence and Generalization for Embodied AI

no code implementations7 Nov 2023 Ainaz Eftekhar, Kuo-Hao Zeng, Jiafei Duan, Ali Farhadi, Ani Kembhavi, Ranjay Krishna

Inspired by selective attention in humans-the process through which people filter their perception based on their experiences, knowledge, and the task at hand-we introduce a parameter-efficient approach to filter visual stimuli for embodied AI.

Object Object Recognition

Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics

no code implementations24 Apr 2023 Kuo-Hao Zeng, Luca Weihs, Roozbeh Mottaghi, Ali Farhadi

A common assumption when training embodied agents is that the impact of taking an action is stable; for instance, executing the "move ahead" action will always move the agent forward by a fixed distance, perhaps with some small amount of actuator-induced noise.

Visual Navigation

Pushing it out of the Way: Interactive Visual Navigation

1 code implementation CVPR 2021 Kuo-Hao Zeng, Luca Weihs, Ali Farhadi, Roozbeh Mottaghi

In this paper, we study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals.

Navigate Visual Navigation

AllenAct: A Framework for Embodied AI Research

1 code implementation28 Aug 2020 Luca Weihs, Jordi Salvador, Klemen Kotar, Unnat Jain, Kuo-Hao Zeng, Roozbeh Mottaghi, Aniruddha Kembhavi

The domain of Embodied AI, in which agents learn to complete tasks through interaction with their environment from egocentric observations, has experienced substantial growth with the advent of deep reinforcement learning and increased interest from the computer vision, NLP, and robotics communities.

Embodied Question Answering Instruction Following +1

Style Example-Guided Text Generation using Generative Adversarial Transformers

no code implementations2 Mar 2020 Kuo-Hao Zeng, Mohammad Shoeybi, Ming-Yu Liu

The style encoder extracts a style code from the reference example, and the text decoder generates texts based on the style code and the context.

Sentence Text Generation

Visual Reaction: Learning to Play Catch with Your Drone

1 code implementation CVPR 2020 Kuo-Hao Zeng, Roozbeh Mottaghi, Luca Weihs, Ali Farhadi

In this paper we address the problem of visual reaction: the task of interacting with dynamic environments where the changes in the environment are not necessarily caused by the agent itself.

Omnidirectional CNN for Visual Place Recognition and Navigation

no code implementations12 Mar 2018 Tsun-Hsuan Wang, Hung-Jui Huang, Juan-Ting Lin, Chan-Wei Hu, Kuo-Hao Zeng, Min Sun

Given a visual input, the task of the O-CNN is not to retrieve the matched place exemplar, but to retrieve the closest place exemplar and estimate the relative distance between the input and the closest place.

Navigate Visual Place Recognition

Self-view Grounding Given a Narrated 360° Video

1 code implementation23 Nov 2017 Shih-Han Chou, Yi-Chun Chen, Kuo-Hao Zeng, Hou-Ning Hu, Jianlong Fu, Min Sun

The negative log reconstruction loss of the reverse sentence (referred to as "irrelevant loss") is jointly minimized to encourage the reverse sentence to be different from the given sentence.

Sentence Visual Grounding

Visual Forecasting by Imitating Dynamics in Natural Sequences

no code implementations ICCV 2017 Kuo-Hao Zeng, William B. Shen, De-An Huang, Min Sun, Juan Carlos Niebles

This allows us to apply IRL at scale and directly imitate the dynamics in high-dimensional continuous visual sequences from the raw pixel values.

Action Anticipation

Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization

no code implementations CVPR 2017 Kuo-Hao Zeng, Shih-Han Chou, Fu-Hsiang Chan, Juan Carlos Niebles, Min Sun

For survival, a living agent must have the ability to assess risk (1) by temporally anticipating accidents before they occur, and (2) by spatially localizing risky regions in the environment to move away from threats.

Accident Anticipation

Title Generation for User Generated Videos

no code implementations25 Aug 2016 Kuo-Hao Zeng, Tseng-Hung Chen, Juan Carlos Niebles, Min Sun

Finally, our sentence augmentation method also outperforms the baselines on the M-VAD dataset.

Sentence Video Captioning

Cannot find the paper you are looking for? You can Submit a new open access paper.