About

Benchmarks

Greatest papers with code

Self-Supervised MultiModal Versatile Networks

NeurIPS 2020 deepmind/deepmind-research

In particular, we explore how best to combine the modalities, such that fine-grained representations of the visual and audio modalities can be maintained, whilst also integrating text into a common embedding.

ACTION RECOGNITION IN VIDEOS SELF-SUPERVISED ACTION RECOGNITION

Temporal Segment Networks for Action Recognition in Videos

8 May 2017open-mmlab/mmaction

Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.

Ranked #5 on Action Classification on Moments in Time (Top 5 Accuracy metric)

ACTION CLASSIFICATION ACTION RECOGNITION ACTION RECOGNITION IN VIDEOS ACTION RECOGNITION IN VIDEOS

Convolutional Two-Stream Network Fusion for Video Action Recognition

CVPR 2016 feichtenhofer/twostreamfusion

Recent applications of Convolutional Neural Networks (ConvNets) for human action recognition in videos have proposed different solutions for incorporating the appearance and motion information.

ACTION RECOGNITION ACTION RECOGNITION IN VIDEOS ACTION RECOGNITION IN VIDEOS

Towards Good Practices for Very Deep Two-Stream ConvNets

8 Jul 2015yjxiong/caffe

However, for action recognition in videos, the improvement of deep convolutional networks is not so evident.

ACTION RECOGNITION ACTION RECOGNITION IN VIDEOS ACTION RECOGNITION IN VIDEOS DATA AUGMENTATION

Learning Spatiotemporal Features with 3D Convolutional Networks

ICCV 2015 open-mmlab/mmaction2

We propose a simple, yet effective approach for spatiotemporal feature learning using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset.

ACTION RECOGNITION ACTION RECOGNITION IN VIDEOS

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition

CVPR 2018 kevin-ssy/Optical-Flow-Guided-Feature

In this study, we introduce a novel compact motion representation for video action recognition, named Optical Flow guided Feature (OFF), which enables the network to distill temporal information through a fast and robust approach.

ACTION RECOGNITION ACTION RECOGNITION IN VIDEOS ACTION RECOGNITION IN VIDEOS OPTICAL FLOW ESTIMATION