Action Segmentation
72 papers with code • 9 benchmarks • 16 datasets
Action Segmentation is a challenging problem in high-level video understanding. In its simplest form, Action Segmentation aims to segment a temporally untrimmed video by time and label each segmented part with one of pre-defined action labels. The results of Action Segmentation can be further used as input to various applications, such as video-to-text and action localization.
Source: TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation
Libraries
Use these libraries to find Action Segmentation models and implementationsDatasets
Subtasks
Latest papers
LOGO: A Long-Form Video Dataset for Group Action Quality Assessment
Action quality assessment (AQA) has become an emerging topic since it can be extensively applied in numerous scenarios.
Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation
We evaluate our segmentation approach and unsupervised learning pipeline on the Breakfast, 50-Salads, YouTube Instructions and Desktop Assembly datasets, yielding state-of-the-art results for the unsupervised video action segmentation task.
Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment
Weakly-supervised action segmentation is a task of learning to partition a long video into several action segments, where training videos are only accompanied by transcripts (ordered list of actions).
Multi-granularity Correspondence Learning from Long-term Noisy Videos
Existing video-language studies mainly focus on learning short video clips, leaving long-term temporal dependencies rarely explored due to over-high computational cost of modeling long videos.
A Decoupled Spatio-Temporal Framework for Skeleton-based Action Segmentation
Effectively modeling discriminative spatio-temporal information is essential for segmenting activities in long action sequences.
Activity Grammars for Temporal Action Segmentation
Sequence prediction on temporal data requires the ability to understand compositional structures of multi-level semantics beyond individual and contextual properties.
Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs
Instead, we propose a novel methodology which performs the adaptation leveraging existing labeled exocentric videos and a new set of unlabeled, synchronized exocentric-egocentric video pairs, for which temporal action segmentation annotations do not need to be collected.
SigFormer: Sparse Signal-Guided Transformer for Multi-Modal Human Action Segmentation
Nowadays, the majority of approaches concentrate on the fusion of dense signals (i. e., RGB, optical flow, and depth maps).
Is Weakly-supervised Action Segmentation Ready For Human-Robot Interaction? No, Let's Improve It With Action-union Learning
To alleviate this issue, we proposed a novel learning pattern in our training stage, which maximizes the probability of action union of surrounding timestamps for unlabeled frames.
End-to-End Streaming Video Temporal Action Segmentation with Reinforce Learning
The end-to-end SVTAS which regard TAS as an action segment clustering task can expand the application scenarios of TAS; and RL is used to alleviate the problem of inconsistent optimization objective and direction.