Efficient Two-Step Networks for Temporal Action Segmentation
Due to boundary ambiguity and over-segmentation issues, identifying all the frames in long untrimmed videos is still challenging. To address these problems, we present the Efficient Two-Step Network (ETSN) with two components. The first step of ETSN is Efficient Temporal Series Pyramid Networks (ETSPNet) that capture both local and global frame-level features and provide accurate predictions of segmentation boundaries. The second step is a novel unsupervised approach called Local Burr Suppression (LBS), which significantly reduces the over-segmentation errors. Our empirical evaluations on the benchmarks including 50Salads, GTEA and Breakfast dataset demonstrate that ETSN outperforms the current state-of-the-art methods by a large margin.
PDFCode
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Action Segmentation | 50 Salads | ETSN | F1@10% | 85.2 | # 12 | |
Edit | 78.8 | # 15 | ||||
Acc | 82.0 | # 22 | ||||
F1@25% | 83.9 | # 13 | ||||
F1@50% | 75.4 | # 15 | ||||
Action Segmentation | Breakfast | ETSN | F1@10% | 74.0 | # 18 | |
F1@50% | 56.2 | # 15 | ||||
Acc | 67.8 | # 22 | ||||
Edit | 70.3 | # 19 | ||||
F1@25% | 69.0 | # 16 | ||||
Action Segmentation | GTEA | ETSN | F1@10% | 91.1 | # 9 | |
F1@50% | 77.9 | # 12 | ||||
Acc | 78.2 | # 18 | ||||
Edit | 86.2 | # 11 | ||||
F1@25% | 90.0 | # 9 |