In particular, we explore how best to combine the modalities, such that fine-grained representations of the visual and audio modalities can be maintained, whilst also integrating text into a common embedding.
Ranked #2 on
Self-Supervised Action Recognition
on HMDB51
ACTION RECOGNITION IN VIDEOS SELF-SUPERVISED ACTION RECOGNITION
Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.
Ranked #5 on
Action Classification
on Moments in Time
(Top 5 Accuracy metric)
ACTION CLASSIFICATION ACTION RECOGNITION ACTION RECOGNITION IN VIDEOS ACTION RECOGNITION IN VIDEOS
The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network.
Ranked #3 on
Multimodal Activity Recognition
on EV-Action
ACTION RECOGNITION ACTION RECOGNITION IN VIDEOS ACTION RECOGNITION IN VIDEOS MULTIMODAL ACTIVITY RECOGNITION
Recent applications of Convolutional Neural Networks (ConvNets) for human action recognition in videos have proposed different solutions for incorporating the appearance and motion information.
Ranked #21 on
Action Recognition
on UCF101
ACTION RECOGNITION ACTION RECOGNITION IN VIDEOS ACTION RECOGNITION IN VIDEOS
However, for action recognition in videos, the improvement of deep convolutional networks is not so evident.
Ranked #25 on
Action Recognition
on UCF101
ACTION RECOGNITION ACTION RECOGNITION IN VIDEOS ACTION RECOGNITION IN VIDEOS DATA AUGMENTATION
We propose a simple, yet effective approach for spatiotemporal feature learning using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset.
Ranked #1 on
Action Recognition In Videos
on Sports-1M
We propose a soft attention based model for the task of action recognition in videos.
ACTION RECOGNITION ACTION RECOGNITION IN VIDEOS ACTION RECOGNITION IN VIDEOS
In this study, we introduce a novel compact motion representation for video action recognition, named Optical Flow guided Feature (OFF), which enables the network to distill temporal information through a fast and robust approach.
Ranked #12 on
Action Recognition
on UCF101
ACTION RECOGNITION ACTION RECOGNITION IN VIDEOS ACTION RECOGNITION IN VIDEOS OPTICAL FLOW ESTIMATION
Two-stream Convolutional Networks (ConvNets) have shown strong performance for human action recognition in videos.
ACTION RECOGNITION ACTION RECOGNITION IN VIDEOS ACTION RECOGNITION IN VIDEOS
Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51, where it is competitive with the state of the art.
Ranked #1 on
Action Classification
on HMDB51
ACTION CLASSIFICATION ACTION CLASSIFICATION ACTION RECOGNITION ACTION RECOGNITION IN VIDEOS ACTION RECOGNITION IN VIDEOS MULTI-TASK LEARNING OPTICAL FLOW ESTIMATION