no code implementations • 24 Feb 2024 • Haotian Xia, Zhengbang Yang, Yuqing Wang, Rhys Tracy, Yun Zhao, Dongdong Huang, Zezhi Chen, Yan Zhu, Yuan-Fang Wang, Weining Shen
A deep understanding of sports, a field rich in strategic and dynamic content, is crucial for advancing Natural Language Processing (NLP).
1 code implementation • 26 Sep 2023 • Haotian Xia, Rhys Tracy, Yun Zhao, Yuqing Wang, Yuan-Fang Wang, Weining Shen
Our frameworks combine setting ball trajectory recognition with a novel set trajectory classifier to generate comprehensive and advanced statistical data.
no code implementations • 22 Aug 2023 • Rhys Tracy, Haotian Xia, Alex Rasla, Yuan-Fang Wang, Ambuj Singh
Our results show that the use of GNNs with our graph encoding yields a much more advanced analysis of the data, which noticeably improves prediction results overall.
no code implementations • 28 Sep 2022 • Haotian Xia, Rhys Tracy, Yun Zhao, Erwan Fraisse, Yuan-Fang Wang, Linda Petzold
The second goal is to introduce a volleyball descriptive language to fully describe the rally processes in the games and apply the language to our dataset.
no code implementations • 9 Jan 2022 • Run-kun Lu, Jian-wei Liu, Yuan-Fang Wang, Hao-jie Xie, Xin Zuo
As we known, auto-encoder is a method of deep learning, which can learn the latent feature of raw data by reconstructing the input, and based on this, we propose a novel algorithm called Auto-encoder based Co-training Multi-View Learning (ACMVL), which utilizes both complementarity and consistency and finds a joint latent feature representation of multiple views.
no code implementations • 8 Jan 2022 • Jian-wei Liu, Yuan-Fang Wang, Run-kun Lu, Xionglin Luo
But not all of this information is useful for classification tasks.
2 code implementations • ICCV 2019 • Xin Wang, Jiawei Wu, Junkun Chen, Lei LI, Yuan-Fang Wang, William Yang Wang
We also introduce two tasks for video-and-language research based on VATEX: (1) Multilingual Video Captioning, aimed at describing a video in various languages with a compact unified captioning model, and (2) Video-guided Machine Translation, to translate a source language description into the target language using the video information as additional spatiotemporal context.
no code implementations • CVPR 2019 • Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang, Larry S. Davis
In this paper, we present Moment Alignment Network (MAN), a novel framework that unifies the candidate moment encoding and temporal structural reasoning in a single-shot feed-forward network.
no code implementations • CVPR 2019 • Xin Wang, Qiuyuan Huang, Asli Celikyilmaz, Jianfeng Gao, Dinghan Shen, Yuan-Fang Wang, William Yang Wang, Lei Zhang
Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out natural language instructions inside real 3D environments.
Ranked #2 on Vision-Language Navigation on Room2Room
no code implementations • 7 Aug 2018 • Da Zhang, Xiyang Dai, Yuan-Fang Wang
(3) We further exploit the temporal context of activities by appropriately fusing multi-scale feature maps, and demonstrate that both local and global temporal contexts are important.
1 code implementation • 21 Jul 2018 • Da Zhang, Xiyang Dai, Xin Wang, Yuan-Fang Wang
In this paper, we present a novel Single Shot multi-Span Detector for temporal activity detection in long, untrimmed videos using a simple end-to-end fully three-dimensional convolutional (Conv3D) network.
2 code implementations • ACL 2018 • Xin Wang, Wenhu Chen, Yuan-Fang Wang, William Yang Wang
Though impressive results have been achieved in visual captioning, the task of generating abstract stories from photo streams is still a little-tapped problem.
Ranked #13 on Visual Storytelling on VIST
1 code implementation • NAACL 2018 • Xin Wang, Yuan-Fang Wang, William Yang Wang
Furthermore, for the first time, we validate the superior performance of the deep audio features on the video captioning task.
no code implementations • CVPR 2018 • Xin Wang, Wenhu Chen, Jiawei Wu, Yuan-Fang Wang, William Yang Wang
Video captioning is the task of automatically generating a textual description of the actions in a video.
Hierarchical Reinforcement Learning reinforcement-learning +2
no code implementations • 31 Jan 2017 • Da Zhang, Hamid Maei, Xin Wang, Yuan-Fang Wang
In this paper we introduce a fully end-to-end approach for visual tracking in videos that learns to predict the bounding box locations of a target object at every frame.
2 code implementations • CVPR 2017 • Xin Wang, Geoffrey Oxholm, Da Zhang, Yuan-Fang Wang
That is, our scheme can generate results that are visually pleasing and more similar to multiple desired artistic styles with color and texture cues at multiple scales.