Action Classification

227 papers with code • 24 benchmarks • 30 datasets

Libraries

Use these libraries to find Action Classification models and implementations

Latest papers with no code

Learning Correlation Structures for Vision Transformers

no code yet • 5 Apr 2024

We introduce a new attention mechanism, dubbed structural self-attention (StructSA), that leverages rich correlation patterns naturally emerging in key-query interactions of attention.

Classification of Tennis Actions Using Deep Learning

no code yet • 4 Feb 2024

Recent advances of deep learning makes it possible to identify specific events in videos with greater precision.

Robustness Evaluation of Machine Learning Models for Robot Arm Action Recognition in Noisy Environments

no code yet • 17 Jan 2024

This paper studies robot arm action recognition in noisy environments using machine learning techniques.

No More Shortcuts: Realizing the Potential of Temporal Self-Supervision

no code yet • 20 Dec 2023

To address these issues, we propose 1) a more challenging reformulation of temporal self-supervision as frame-level (rather than clip-level) recognition tasks and 2) an effective augmentation strategy to mitigate shortcuts.

ST(OR)2: Spatio-Temporal Object Level Reasoning for Activity Recognition in the Operating Room

no code yet • 19 Dec 2023

Surgical robotics holds much promise for improving patient safety and clinician experience in the Operating Room (OR).

AdaFocus: Towards End-to-end Weakly Supervised Learning for Long-Video Action Understanding

no code yet • 28 Nov 2023

Under the weak supervision setting, action labels are provided for the whole video without precise start and end times of the action clip.

ADM-Loc: Actionness Distribution Modeling for Point-supervised Temporal Action Localization

no code yet • 27 Nov 2023

This paper addresses the challenge of point-supervised temporal action detection, in which only one frame per action instance is annotated in the training set.

Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

no code yet • 9 Nov 2023

We propose a multimodal model, called Mirasol3B, consisting of an autoregressive component for the time-synchronized modalities (audio and video), and an autoregressive component for the context modalities which are not necessarily aligned in time but are still sequential.

OmniVec: Learning robust representations with cross modal sharing

no code yet • 7 Nov 2023

We demonstrate empirically that, using a joint network to train across modalities leads to meaningful information sharing and this allows us to achieve state-of-the-art results on most of the benchmarks.

Asymmetric Masked Distillation for Pre-Training Small Foundation Models

no code yet • 6 Nov 2023

And AMD achieves 73. 3% classification accuracy using the ViT-B model on the Something-in-Something V2 dataset, a 3. 7% improvement over the original ViT-B model from VideoMAE.