Action Detection
235 papers with code • 11 benchmarks • 33 datasets
Action Detection aims to find both where and when an action occurs within a video clip and classify what the action is taking place. Typically results are given in the form of action tublets, which are action bounding boxes linked across time in the video. This is related to temporal localization, which seeks to identify the start and end frame of an action, and action recognition, which seeks only to classify which action is taking place and typically assumes a trimmed video.
Libraries
Use these libraries to find Action Detection models and implementationsDatasets
Subtasks
Most implemented papers
Actions as Moving Points
The existing action tubelet detectors often depend on heuristic anchor design and placement, which might be computationally expensive and sub-optimal for precise localization.
Harvesting Ambient RF for Presence Detection Through Deep Learning
With presence detection, how to collect training data with human presence can have a significant impact on the performance.
PaStaNet: Toward Human Activity Knowledge Engine
In light of this, we propose a new path: infer human part states first and then reason out the activities based on part-level semantics.
Asynchronous Interaction Aggregation for Action Detection
We propose the Asynchronous Interaction Aggregation network (AIA) that leverages different interactions to boost action detection.
VoxLingua107: a Dataset for Spoken Language Recognition
Speech activity detection and speaker diarization are used to extract segments from the videos that contain speech.
Generic Event Boundary Detection: A Benchmark for Event Segmentation
This paper presents a novel task together with a new benchmark for detecting generic, taxonomy-free event boundaries that segment a whole video into chunks.
Relaxed Transformer Decoders for Direct Action Proposal Generation
Extensive experiments on THUMOS14 and ActivityNet-1. 3 benchmarks demonstrate the effectiveness of RTD-Net, on both tasks of temporal action proposal generation and temporal action detection.
ROAD: The ROad event Awareness Dataset for Autonomous Driving
We also report the performance on the ROAD tasks of Slowfast and YOLOv5 detectors, as well as that of the winners of the ICCV2021 ROAD challenge, which highlight the challenges faced by situation awareness in autonomous driving.
End-to-end speaker segmentation for overlap-aware resegmentation
Experiments on multiple speaker diarization datasets conclude that our model can be used with great success on both voice activity detection and overlapped speech detection.
Long Short-Term Transformer for Online Action Detection
We present Long Short-term TRansformer (LSTR), a temporal modeling algorithm for online action detection, which employs a long- and short-term memory mechanism to model prolonged sequence data.