no code implementations • 23 Jan 2024 • Shih-Han Chou, Matthew Kowal, Yasmin Niknam, Diana Moyano, Shayaan Mehdi, Richard Pito, Cheng Zhang, Ian Knopke, Sedef Akinli Kocak, Leonid Sigal, Yalda Mohsenzadeh
Towards a solution for designing this ability in algorithms, we present a large-scale analysis on an in-house dataset collected by the Reuters News Agency, called Reuters Video-Language News (ReutersViLNews) dataset which focuses on high-level video-language understanding with an emphasis on long-form news.
no code implementations • 14 Mar 2023 • Shih-Han Chou, James J. Little, Leonid Sigal
We show that our commonsense knowledge enhanced approach produces significant improvements on this task (up to 57% in METEOR and 8. 5% in CIDEr), as well as the state-of-the-art result on more traditional video captioning in the ActivityNet Captions dataset [29].
1 code implementation • 4 Nov 2020 • Tanzila Rahman, Shih-Han Chou, Leonid Sigal, Giuseppe Carenini
We also propose multimodal fusion module to combine both visual and textual information.
no code implementations • 10 Jan 2020 • Shih-Han Chou, Wei-Lun Chao, Wei-Sheng Lai, Min Sun, Ming-Hsuan Yang
We then study two different VQA models on VQA 360, including one conventional model that takes an equirectangular image (with intrinsic distortion) as input and one dedicated model that first projects a 360 image onto cubemaps and subsequently aggregates the information from multiple spatial resolutions.
no code implementations • 3 Oct 2019 • Shih-Han Chou, Cheng Sun, Wen-Yen Chang, Wan-Ting Hsu, Min Sun, Jianlong Fu
In this paper, our goal is to provide a standard dataset to facilitate the vision and machine learning communities in 360{\deg} domain.
1 code implementation • 23 Nov 2017 • Shih-Han Chou, Yi-Chun Chen, Kuo-Hao Zeng, Hou-Ning Hu, Jianlong Fu, Min Sun
The negative log reconstruction loss of the reverse sentence (referred to as "irrelevant loss") is jointly minimized to encourage the reverse sentence to be different from the given sentence.
no code implementations • CVPR 2017 • Kuo-Hao Zeng, Shih-Han Chou, Fu-Hsiang Chan, Juan Carlos Niebles, Min Sun
For survival, a living agent must have the ability to assess risk (1) by temporally anticipating accidents before they occur, and (2) by spatially localizing risky regions in the environment to move away from threats.