Action Segmentation

73 papers with code • 9 benchmarks • 16 datasets

Action Segmentation is a challenging problem in high-level video understanding. In its simplest form, Action Segmentation aims to segment a temporally untrimmed video by time and label each segmented part with one of pre-defined action labels. The results of Action Segmentation can be further used as input to various applications, such as video-to-text and action localization.

Source: TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

Libraries

Use these libraries to find Action Segmentation models and implementations
2 papers
29,408

Unified Fully and Timestamp Supervised Temporal Action Segmentation via Sequence to Sequence Translation

boschresearch/uvast 1 Sep 2022

This paper introduces a unified framework for video action segmentation via sequence to sequence (seq2seq) translation in a fully and timestamp supervised setup.

28
01 Sep 2022

RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks

ShangHua-Gao/RFNext 14 Jun 2022

Our search scheme exploits both global search to find the coarse combinations and local search to get the refined receptive field combinations further.

62
14 Jun 2022

Do we really need temporal convolutions in action segmentation?

ddz16/TUT 26 May 2022

Most state-of-the-art methods focus on designing temporal convolution-based models, but the inflexibility of temporal convolutions and the difficulties in modeling long-term temporal dependencies restrict the potential of these models.

0
26 May 2022

Cross-Enhancement Transformer for Action Segmentation

Wangjhdeveloper/CETNet 19 May 2022

Temporal convolutions have been the paradigm of choice in action segmentation, which enhances long-term receptive fields by increasing convolution layers.

3
19 May 2022

Temporal Alignment Networks for Long-term Video

tengdahan/temporalalignnet CVPR 2022

The objective of this paper is a temporal alignment network that ingests long term video sequences, and associated text sentences, in order to: (1) determine if a sentence is alignable with the video; and (2) if it is alignable, then determine its alignment.

106
06 Apr 2022

Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities

assembly101/assembly101.github.io CVPR 2022

Assembly101 is a new procedural activity dataset featuring 4321 videos of people assembling and disassembling 101 "take-apart" toy vehicles.

1
28 Mar 2022

Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

ttlmh/bridge-prompt CVPR 2022

The generated text prompts are paired with corresponding video clips, and together co-train the text encoder and the video encoder via a contrastive approach.

88
26 Mar 2022

HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction

leolyliu/HOI4D-Instructions CVPR 2022

We present HOI4D, a large-scale 4D egocentric dataset with rich annotations, to catalyze the research of category-level human-object interaction.

39
03 Mar 2022

Skeleton-Based Action Segmentation with Multi-Stage Spatial-Temporal Graph Convolutional Neural Networks

benjaminfiltjens/ms-gcn 3 Feb 2022

State-of-the-art action segmentation approaches use multiple stages of temporal convolutions.

26
03 Feb 2022

Set-Supervised Action Learning in Procedural Task Videos via Pairwise Order Consistency

ZijiaLewisLu/CVPR22-POC CVPR 2022

We address the problem of set-supervised action learning, whose goal is to learn an action segmentation model using weak supervision in the form of sets of actions occurring in training videos.

9
01 Jan 2022