Boundary-Aware Cascade Networks for Temporal Action Segmentation
Identifying human action segments in an untrimmed video is still challenging due to boundary ambiguity and over-segmentation issues. To address these problems, we present a new boundary-aware cascade network by introducing two novel components. First, we devise a new cascading paradigm, called Stage Cascade, to enable our model to have adaptive receptive fields and more confident predictions for ambiguous frames. Second, we design a general and principled smoothing operation, termed as local barrier pooling, to aggregate local predictions by leveraging semantic boundary information. Moreover, these two components can be jointly fine-tuned in an end-to-end manner. We perform experiments on three challenging datasets: 50Salads, GTEA and Breakfast dataset, demonstrating that our framework significantly out-performs the current state-of-the-art methods. The code is available at https://github.com/MCG-NJU/BCN.
PDF AbstractCode
Tasks
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Action Segmentation | 50 Salads | BCN | F1@10% | 82.3 | # 18 | |
Edit | 74.3 | # 19 | ||||
Acc | 84.4 | # 15 | ||||
F1@25% | 81.3 | # 18 | ||||
F1@50% | 74 | # 16 | ||||
Action Segmentation | Breakfast | BCN | F1@10% | 68.7 | # 21 | |
F1@50% | 55.0 | # 18 | ||||
Acc | 70.4 | # 16 | ||||
Edit | 66.2 | # 23 | ||||
F1@25% | 65.5 | # 21 | ||||
Action Segmentation | GTEA | BCN | F1@10% | 88.5 | # 18 | |
F1@50% | 77.3 | # 14 | ||||
Acc | 79.8 | # 12 | ||||
Edit | 84.4 | # 16 | ||||
F1@25% | 87.1 | # 18 |