Activity Recognition In Videos

9 papers with code • 1 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Very Deep Convolutional Networks for Large-Scale Image Recognition

tensorflow/models 4 Sep 2014

In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting.

Representation Flow for Action Recognition

piergiaj/representation-flow-cvpr19 CVPR 2019

Our representation flow layer is a fully-differentiable layer designed to capture the `flow' of any representation channel within a convolutional neural network for action recognition.

Large-scale weakly-supervised pre-training for video action recognition

microsoft/computervision-recipes CVPR 2019

Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning?

Pooled Motion Features for First-Person Videos

mryoo/pooled_time_series CVPR 2015

In this paper, we present a new feature representation for first-person videos.

Learning Latent Sub-events in Activity Videos Using Temporal Attention Filters

piergiaj/latent-subevents 26 May 2016

In this paper, we newly introduce the concept of temporal attention filters, and describe how they can be used for human activity recognition from videos.

Convolutional Spiking Neural Networks for Spatio-Temporal Feature Extraction

aa-samad/conv_snn 27 Mar 2020

Spiking neural networks (SNNs) can be used in low-power and embedded systems (such as emerging neuromorphic chips) due to their event-based nature.

TorMentor: Deterministic dynamic-path, data augmentations with fractals

anguelos/tormentor 7 Apr 2022

We propose the use of fractals as a means of efficient data augmentation.

Dual-path Adaptation from Image to Video Transformers

park-jungin/dualpath CVPR 2023

In this paper, we efficiently transfer the surpassing representation power of the vision foundation models, such as ViT and Swin, for video understanding with only a few trainable parameters.