Action Segmentation
72 papers with code • 9 benchmarks • 16 datasets
Action Segmentation is a challenging problem in high-level video understanding. In its simplest form, Action Segmentation aims to segment a temporally untrimmed video by time and label each segmented part with one of pre-defined action labels. The results of Action Segmentation can be further used as input to various applications, such as video-to-text and action localization.
Source: TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation
Libraries
Use these libraries to find Action Segmentation models and implementationsDatasets
Subtasks
Latest papers
OTAS: Unsupervised Boundary Detection for Object-Centric Temporal Action Segmentation
In this paper, we explore the merits of local features by proposing the unsupervised framework of Object-centric Temporal Action Segmentation (OTAS).
How Much Temporal Long-Term Context is Needed for Action Segmentation?
In this work, we try to answer how much long-term temporal context is required for temporal action segmentation by introducing a transformer-based model that leverages sparse attention to capture the full context of a video.
UnLoc: A Unified Framework for Video Localization Tasks
While large-scale image-text pretrained models such as CLIP have been used for multiple video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos is still a relatively unexplored task.
HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding
Understanding comprehensive assembly knowledge from videos is critical for futuristic ultra-intelligent industry.
Pretrained Language Models as Visual Planners for Human Assistance
Given a succinct natural language goal, e. g., "make a shelf", and a video of the user's progress so far, the aim of VPA is to devise a plan, i. e., a sequence of actions such as "sand shelf", "paint shelf", etc.
Leveraging triplet loss for unsupervised action segmentation
In this paper, we propose a novel fully unsupervised framework that learns action representations suitable for the action segmentation task from the single input video itself, without requiring any training data.
Timestamp-Supervised Action Segmentation from the Perspective of Clustering
Most existing methods generate pseudo-labels for all frames in each video to train the segmentation model.
Temporal Action Segmentation: An Analysis of Modern Techniques
Temporal action segmentation (TAS) in videos aims at densely identifying video frames in minutes-long videos with multiple action classes.
Streaming Video Temporal Action Segmentation In Real Time
As the real-time action segmentation task is different from TAS task, we define it as streaming video real-time temporal action segmentation (SVTAS) task.
Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation
This paper introduces a unified framework for video action segmentation via sequence to sequence (seq2seq) translation in a fully and timestamp supervised setup.