Action Detection
235 papers with code • 11 benchmarks • 33 datasets
Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.
Libraries
Use these libraries to find Action Detection models and implementationsDatasets
Subtasks
Latest papers with no code
Multi-Input Multi-Output Target-Speaker Voice Activity Detection For Unified, Flexible, and Robust Audio-Visual Speaker Diarization
The proposed method can take audio-visual input and leverage the speaker's acoustic footprint or lip track to flexibly conduct audio-based, video-based, and audio-visual speaker diarization in a unified sequence-to-sequence framework.
Single-Microphone Speaker Separation and Voice Activity Detection in Noisy and Reverberant Environments
Speech separation involves extracting an individual speaker's voice from a multi-speaker audio signal.
Self-supervised Pretraining for Robust Personalized Voice Activity Detection in Adverse Conditions
Our experiments show that self-supervised pretraining not only improves performance in clean conditions, but also yields models which are more robust to adverse conditions compared to purely supervised learning.
SADA: Semantic adversarial unsupervised domain adaptation for Temporal Action Localization
Temporal Action Localization (TAL) is a complex task that poses relevant challenges, particularly when attempting to generalize on new -- unseen -- domains in real-world applications.
Spatiotemporal Event Graphs for Dynamic Scene Understanding
In this thesis, we present a series of frameworks for dynamic scene understanding starting from road event detection from an autonomous driving perspective to complex video activity detection, followed by continual learning approaches for the life-long learning of the models.
Low-power, Continuous Remote Behavioral Localization with Event Cameras
However, observing wild species at remote locations remains a challenging task due to difficult lighting conditions and constraints on power supply and data storage.
Towards More Practical Group Activity Detection: A New Benchmark and Model
Group activity detection (GAD) is the task of identifying members of each group and classifying the activity of the group at the same time in a video.
Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
To this end, we design effective cross-snippet propagation modules to gradually exchange short-term video information among different snippets from two levels.
SPIRE-SIES: A Spontaneous Indian English Speech Corpus
Transcripts for 23 hours is generated and validated which can serve as a spontaneous speech ASR benchmark.
ADM-Loc: Actionness Distribution Modeling for Point-supervised Temporal Action Localization
This paper addresses the challenge of point-supervised temporal action detection, in which only one frame per action instance is annotated in the training set.