Action Detection

235 papers with code • 11 benchmarks • 33 datasets

Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.

Libraries

Use these libraries to find Action Detection models and implementations
6 papers
3,916
2 papers
3,003
See all 6 libraries.

Latest papers with no code

Multi-Input Multi-Output Target-Speaker Voice Activity Detection For Unified, Flexible, and Robust Audio-Visual Speaker Diarization

no code yet • 16 Jan 2024

The proposed method can take audio-visual input and leverage the speaker's acoustic footprint or lip track to flexibly conduct audio-based, video-based, and audio-visual speaker diarization in a unified sequence-to-sequence framework.

Single-Microphone Speaker Separation and Voice Activity Detection in Noisy and Reverberant Environments

no code yet • 7 Jan 2024

Speech separation involves extracting an individual speaker's voice from a multi-speaker audio signal.

Self-supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions

no code yet • 27 Dec 2023

Our experiments show that self-supervised pretraining not only improves performance in clean conditions, but also yields models which are more robust to adverse conditions compared to purely supervised learning.

SADA: Semantic adversarial unsupervised domain adaptation for Temporal Action Localization

no code yet • 20 Dec 2023

Temporal Action Localization (TAL) is a complex task that poses relevant challenges, particularly when attempting to generalize on new -- unseen -- domains in real-world applications.

Spatiotemporal Event Graphs for Dynamic Scene Understanding

no code yet • 11 Dec 2023

In this thesis, we present a series of frameworks for dynamic scene understanding starting from road event detection from an autonomous driving perspective to complex video activity detection, followed by continual learning approaches for the life-long learning of the models.

Low-power, Continuous Remote Behavioral Localization with Event Cameras

no code yet • 6 Dec 2023

However, observing wild species at remote locations remains a challenging task due to difficult lighting conditions and constraints on power supply and data storage.

Towards More Practical Group Activity Detection: A New Benchmark and Model

no code yet • 5 Dec 2023

Group activity detection (GAD) is the task of identifying members of each group and classifying the activity of the group at the same time in a video.

Adapting Short-Term Transformers for Action Detection in Untrimmed Videos

no code yet • 4 Dec 2023

To this end, we design effective cross-snippet propagation modules to gradually exchange short-term video information among different snippets from two levels.

SPIRE-SIES: A Spontaneous Indian English Speech Corpus

no code yet • 1 Dec 2023

Transcripts for 23 hours is generated and validated which can serve as a spontaneous speech ASR benchmark.

ADM-Loc: Actionness Distribution Modeling for Point-supervised Temporal Action Localization

no code yet • 27 Nov 2023

This paper addresses the challenge of point-supervised temporal action detection, in which only one frame per action instance is annotated in the training set.