Action Segmentation

72 papers with code • 9 benchmarks • 16 datasets

Action Segmentation is a challenging problem in high-level video understanding. In its simplest form, Action Segmentation aims to segment a temporally untrimmed video by time and label each segmented part with one of pre-defined action labels. The results of Action Segmentation can be further used as input to various applications, such as video-to-text and action localization.

Source: TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

Benchmarks

Add a Result

These leaderboards are used to track progress in Action Segmentation

Dataset	Best Model	Compare
Breakfast	AdaFocus (newly extracted I3D-features, LT-Context model)	See all
50 Salads	Br-Prompt+ASPnet (RGB, flow, accelerometer)	See all
GTEA	Semantic2Graph	See all
COIN	UnLoc-L	See all
JIGSAWS	MRG-Net	See all
Assembly101	LTContext	See all
Youtube INRIA Instructional	TSA (FINCH)	See all
50Salads	EUT	See all
MPII Cooking 2 Dataset	Unsup. TW-FINCH (K=avg/activity)	See all

Libraries

Use these libraries to find Action Segmentation models and implementations

pytorch/fairseq

2 papers

29,287

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

OTAS: Unsupervised Boundary Detection for Object-Centric Temporal Action Segmentation

yl596/otas • • 12 Sep 2023

In this paper, we explore the merits of local features by proposing the unsupervised framework of Object-centric Temporal Action Segmentation (OTAS).

12 Sep 2023

Paper
Code

How Much Temporal Long-Term Context is Needed for Action Segmentation?

ltcontext/ltcontext • • ICCV 2023

In this work, we try to answer how much long-term temporal context is required for temporal action segmentation by introducing a transformer-based model that leverages sparse attention to capture the full context of a video.

22 Aug 2023

Paper
Code

UnLoc: A Unified Framework for Video Localization Tasks

google-research/scenic • • ICCV 2023

While large-scale image-text pretrained models such as CLIP have been used for multiple video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos is still a relatively unexplored task.

3,006

21 Aug 2023

Paper
Code

HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding

iai-hrc/ha-vid • NeurIPS 2023

Understanding comprehensive assembly knowledge from videos is critical for futuristic ultra-intelligent industry.

09 Jul 2023

Paper
Code

Pretrained Language Models as Visual Planners for Human Assistance

facebookresearch/vlamp • • ICCV 2023

Given a succinct natural language goal, e. g., "make a shelf", and a video of the user's progress so far, the aim of VPA is to devise a plan, i. e., a sequence of actions such as "sand shelf", "paint shelf", etc.

17 Apr 2023

Paper
Code

Leveraging triplet loss for unsupervised action segmentation

elenabbbuenob/tsa-actionseg • • 13 Apr 2023

In this paper, we propose a novel fully unsupervised framework that learns action representations suitable for the action segmentation task from the single input video itself, without requiring any training data.

13 Apr 2023

Paper
Code

Timestamp-Supervised Action Segmentation from the Perspective of Clustering

ddz16/TSASPC • • 22 Dec 2022

Most existing methods generate pseudo-labels for all frames in each video to train the segmentation model.

22 Dec 2022

Paper
Code

Temporal Action Segmentation: An Analysis of Modern Techniques

atlas-eccv22/awesome-temporal-action-segmentation • 19 Oct 2022

Temporal action segmentation (TAS) in videos aims at densely identifying video frames in minutes-long videos with multiple action classes.

115

19 Oct 2022

Paper
Code

Streaming Video Temporal Action Segmentation In Real Time

Thinksky5124/SVTAS • • 28 Sep 2022

As the real-time action segmentation task is different from TAS task, we define it as streaming video real-time temporal action segmentation (SVTAS) task.

28 Sep 2022

Paper
Code

Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation

boschresearch/uvast • • 1 Sep 2022

This paper introduces a unified framework for video action segmentation via sequence to sequence (seq2seq) translation in a fully and timestamp supervised setup.

01 Sep 2022

Paper
Code

Action Segmentation

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result