Search Results for author: Shih-Han Chou

Found 7 papers, 2 papers with code

Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews)

no code implementations23 Jan 2024 Shih-Han Chou, Matthew Kowal, Yasmin Niknam, Diana Moyano, Shayaan Mehdi, Richard Pito, Cheng Zhang, Ian Knopke, Sedef Akinli Kocak, Leonid Sigal, Yalda Mohsenzadeh

Towards a solution for designing this ability in algorithms, we present a large-scale analysis on an in-house dataset collected by the Reuters News Agency, called Reuters Video-Language News (ReutersViLNews) dataset which focuses on high-level video-language understanding with an emphasis on long-form news.

Miscellaneous Video Description

Implicit and Explicit Commonsense for Multi-sentence Video Captioning

no code implementations14 Mar 2023 Shih-Han Chou, James J. Little, Leonid Sigal

We show that our commonsense knowledge enhanced approach produces significant improvements on this task (up to 57% in METEOR and 8. 5% in CIDEr), as well as the state-of-the-art result on more traditional video captioning in the ActivityNet Captions dataset [29].

Imitation Learning Sentence +1

Visual Question Answering on 360° Images

no code implementations10 Jan 2020 Shih-Han Chou, Wei-Lun Chao, Wei-Sheng Lai, Min Sun, Ming-Hsuan Yang

We then study two different VQA models on VQA 360, including one conventional model that takes an equirectangular image (with intrinsic distortion) as input and one dedicated model that first projects a 360 image onto cubemaps and subsequently aggregates the information from multiple spatial resolutions.

Question Answering Visual Question Answering

360-Indoor: Towards Learning Real-World Objects in 360° Indoor Equirectangular Images

no code implementations3 Oct 2019 Shih-Han Chou, Cheng Sun, Wen-Yen Chang, Wan-Ting Hsu, Min Sun, Jianlong Fu

In this paper, our goal is to provide a standard dataset to facilitate the vision and machine learning communities in 360{\deg} domain.

Object object-detection +1

Self-view Grounding Given a Narrated 360° Video

1 code implementation23 Nov 2017 Shih-Han Chou, Yi-Chun Chen, Kuo-Hao Zeng, Hou-Ning Hu, Jianlong Fu, Min Sun

The negative log reconstruction loss of the reverse sentence (referred to as "irrelevant loss") is jointly minimized to encourage the reverse sentence to be different from the given sentence.

Sentence Visual Grounding

Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization

no code implementations CVPR 2017 Kuo-Hao Zeng, Shih-Han Chou, Fu-Hsiang Chan, Juan Carlos Niebles, Min Sun

For survival, a living agent must have the ability to assess risk (1) by temporally anticipating accidents before they occur, and (2) by spatially localizing risky regions in the environment to move away from threats.

Accident Anticipation

Cannot find the paper you are looking for? You can Submit a new open access paper.