Efficient Two-Step Networks for Temporal Action Segmentation

Due to boundary ambiguity and over-segmentation issues, identifying all the frames in long untrimmed videos is still challenging. To address these problems, we present the Efficient Two-Step Network (ETSN) with two components. The first step of ETSN is Efficient Temporal Series Pyramid Networks (ETSPNet) that capture both local and global frame-level features and provide accurate predictions of segmentation boundaries. The second step is a novel unsupervised approach called Local Burr Suppression (LBS), which significantly reduces the over-segmentation errors. Our empirical evaluations on the benchmarks including 50Salads, GTEA and Breakfast dataset demonstrate that ETSN outperforms the current state-of-the-art methods by a large margin.

PDF

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Action Segmentation 50 Salads ETSN F1@10% 85.2 # 12
Edit 78.8 # 15
Acc 82.0 # 22
F1@25% 83.9 # 13
F1@50% 75.4 # 15
Action Segmentation Breakfast ETSN F1@10% 74.0 # 18
F1@50% 56.2 # 15
Acc 67.8 # 22
Edit 70.3 # 19
F1@25% 69.0 # 16
Action Segmentation GTEA ETSN F1@10% 91.1 # 9
F1@50% 77.9 # 12
Acc 78.2 # 18
Edit 86.2 # 11
F1@25% 90.0 # 9

Methods


No methods listed for this paper. Add relevant methods here