Action Detection
235 papers with code • 11 benchmarks • 33 datasets
Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.
Libraries
Use these libraries to find Action Detection models and implementationsDatasets
Subtasks
Latest papers
Centre Stage: Centricity-based Audio-Visual Temporal Action Detection
Previous one-stage action detection approaches have modelled temporal dependencies using only the visual modality.
ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors
ChimpACT is both comprehensive and challenging, consisting of 163 videos with a cumulative 160, 500 frames, each richly annotated with detection, identification, pose estimation, and fine-grained spatiotemporal behavior labels.
Boundary Discretization and Reliable Classification Network for Temporal Action Detection
Specifically, the boundary discretization module (BDM) elegantly merges anchor-based and anchor-free approaches in the form of boundary discretization, avoiding the handcrafted anchors design required by traditional mixed methods.
ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios
ENIGMA-51 is a new egocentric dataset acquired in an industrial scenario by 19 subjects who followed instructions to complete the repair of electrical boards using industrial tools (e. g., electric screwdriver) and equipments (e. g., oscilloscope).
Temporal Action Localization with Enhanced Instant Discriminability
Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video.
COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers
We present COMEDIAN, a novel pipeline to initialize spatiotemporal transformers for action spotting, which involves self-supervised learning and knowledge distillation.
Progression-Guided Temporal Action Detection in Videos
The framework locates actions in videos by detecting the action evolution process.
Memory-and-Anticipation Transformer for Online Action Understanding
Based on this idea, we present Memory-and-Anticipation Transformer (MAT), a memory-anticipation-based approach, to address the online action detection and anticipation tasks.
Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations
Two metrics are proposed to evaluate AER performance with automatic segmentation based on time-weighted emotion and speaker classification errors.
ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development
We introduce "ivrit. ai", a comprehensive Hebrew speech dataset, addressing the distinct lack of extensive, high-quality resources for advancing Automated Speech Recognition (ASR) technology in Hebrew.