Video Recognition

147 papers with code • 0 benchmarks • 10 datasets

Video Recognition is a process of obtaining, processing, and analysing data that it receives from a visual source, specifically video.

Libraries

Use these libraries to find Video Recognition models and implementations
5 papers
3,912
3 papers
3,001
See all 9 libraries.

Most implemented papers

MoViNets: Mobile Video Networks for Efficient Video Recognition

tensorflow/models CVPR 2021

We present Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference.

Flow-Guided Feature Aggregation for Video Object Detection

msracver/Flow-Guided-Feature-Aggregation ICCV 2017

The accuracy of detection suffers from degenerated object appearances in videos, e. g., motion blur, video defocus, rare poses, etc.

A^2-Nets: Double Attention Networks

nguyenvo09/Double-Attention-Network NeurIPS 2018

Learning to capture long-range relations is fundamental to image/video recognition.

Sequence Level Semantics Aggregation for Video Object Detection

open-mmlab/mmtracking ICCV 2019

In this work, we argue that aggregating features in the full-sequence level will lead to more discriminative and robust features for video object detection.

Improved Residual Networks for Image and Video Recognition

iduta/iresnet 10 Apr 2020

We successfully train a 404-layer deep CNN on the ImageNet dataset and a 3002-layer network on CIFAR-10 and CIFAR-100, while the baseline is not able to converge at such extreme depths.

TAM: Temporal Adaptive Module for Video Recognition

liu-zhy/TANet ICCV 2021

Video data is with complex temporal dynamics due to various factors such as camera motion, speed variation, and different activities.

Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework

BestJuly/Inter-intra-video-contrastive-learning 6 Aug 2020

With the proposed Inter-Intra Contrastive (IIC) framework, we can train spatio-temporal convolutional networks to learn video representations.

Learning Equivariant Representations

daniilidis-group/spherical-cnn 4 Dec 2020

In this thesis, we extend equivariance to other kinds of transformations, such as rotation and scaling.

Towards Long-Form Video Understanding

chaoyuaw/lvu CVPR 2021

Our world offers a never-ending stream of visual stimuli, yet today's vision systems only accurately recognize patterns within a few seconds.

AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition

ShoufaChen/AdaptFormer 26 May 2022

To address this challenge, we propose an effective adaptation approach for Transformer, namely AdaptFormer, which can adapt the pre-trained ViTs into many different image and video tasks efficiently.