Temporal Action Localization

421 papers with code • 14 benchmarks • 42 datasets

Temporal Action Localization aims to detect activities in the video stream and output beginning and end timestamps. It is closely related to Temporal Action Proposal Generation.

Benchmarks

Add a Result

These leaderboards are used to track progress in Temporal Action Localization

Dataset	Best Model	Compare
THUMOS’14	AdaTAD (VideoMAEv2-giant)	See all
ActivityNet-1.3	ActionMamba (InternVideo2-6B)	See all
HACS	ActionMamba(InternVideo2-6B)	See all
CrossTask	VideoCLIP	See all
MultiTHUMOS	TriDet (VideoMAEv2)	See all
FineAction	ActionMamba(InternVideo2-6B)	See all
EPIC-KITCHENS-100	AdaTAD (verb, VideoMAE-L)	See all
MUSES	TemporalMaxer	See all
MEXaction2	S-CNN	See all
ActivityNet-1.2	DeepMetricLearner	See all
THUMOS'14	AdaTAD (VideoMAEv2-giant)	See all
Ego4D MQ val	ActionFormer (SlowFast+Omnivore+EgoVLP)	See all
Ego4D MQ test	ActionFormer (SlowFast+Omnivore+EgoVLP)	See all
THUMOS14	BasicTAD (R50-SlowOnly)	See all

Show all 14 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Temporal Action Localization models and implementations

open-mmlab/mmaction2

9 papers

3,887

yjxiong/caffe

4 papers

550

towhee-io/towhee

3 papers

2,986

bryanyzhu/two-stream-pytorch

3 papers

554

See all 12 libraries.

Datasets

Subtasks

Temporal Action Proposal Generation

Activity Recognition In Videos

Action Recognition In Still Images

Latest papers with no code

Most implemented Social Latest No code

DeepLocalization: Using change point detection for Temporal Action Localization

no code yet • 18 Apr 2024

In this study, we introduce DeepLocalization, an innovative framework devised for the real-time localization of actions tailored explicitly for monitoring driver behavior.

Paper
Add Code

Learning to Score Sign Language with Two-stage Method

no code yet • 16 Apr 2024

Human action recognition and performance assessment have been hot research topics in recent years.

Paper
Add Code

Leveraging Temporal Contextualization for Video Action Recognition

no code yet • 15 Apr 2024

We propose Temporal Contextualization (TC), a novel layer-wise temporal information infusion mechanism for video that extracts core information from each frame, interconnects relevant information across the video to summarize into context tokens, and ultimately leverages the context tokens during the feature encoding process.

Paper
Add Code

Exploring Explainability in Video Action Recognition

no code yet • 13 Apr 2024

To address these, we introduce Video-TCAV, by building on TCAV for Image Classification tasks, which aims to quantify the importance of specific concepts in the decision-making process of Video Action Recognition models.

Paper
Add Code

Multimodal Attack Detection for Action Recognition Models

no code yet • 13 Apr 2024

In addition, we analyze our method's real-time performance with different hardware setups to demonstrate its potential as a practical defense mechanism.

Paper
Add Code

MSSTNet: A Multi-Scale Spatio-Temporal CNN-Transformer Network for Dynamic Facial Expression Recognition

no code yet • 12 Apr 2024

Our approach takes spatial features of different scales extracted by CNN and feeds them into a Multi-scale Embedding Layer (MELayer).

Paper
Add Code

Localizing Moments of Actions in Untrimmed Videos of Infants with Autism Spectrum Disorder

no code yet • 8 Apr 2024

This study is the first to conduct end-to-end temporal action localization in untrimmed videos of infants with ASD, offering promising avenues for early intervention and support.

Paper
Add Code

PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos

no code yet • 5 Apr 2024

PhysPT exploits a Transformer encoder-decoder backbone to effectively learn human dynamics in a self-supervised manner.

Paper
Add Code

Language Model Guided Interpretable Video Action Reasoning

no code yet • 2 Apr 2024

Extensive experiments on two complex video action datasets, Charades & CAD-120, validates the improved performance and interpretability of our LaIAR framework.

Paper
Add Code

LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization

no code yet • 1 Apr 2024

Temporal Action Localization (TAL) involves localizing and classifying action snippets in an untrimmed video.

Paper
Add Code

Temporal Action Localization

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result