Boundary-Aware Cascade Networks for Temporal Action Segmentation

Identifying human action segments in an untrimmed video is still challenging due to boundary ambiguity and over-segmentation issues. To address these problems, we present a new boundary-aware cascade network by introducing two novel components. First, we devise a new cascading paradigm, called Stage Cascade, to enable our model to have adaptive receptive fields and more confident predictions for ambiguous frames. Second, we design a general and principled smoothing operation, termed as local barrier pooling, to aggregate local predictions by leveraging semantic boundary information. Moreover, these two components can be jointly fine-tuned in an end-to-end manner. We perform experiments on three challenging datasets: 50Salads, GTEA and Breakfast dataset, demonstrating that our framework significantly out-performs the current state-of-the-art methods. The code is available at https://github.com/MCG-NJU/BCN.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Action Segmentation 50 Salads BCN F1@10% 82.3 # 18
Edit 74.3 # 19
Acc 84.4 # 15
F1@25% 81.3 # 18
F1@50% 74 # 16
Action Segmentation Breakfast BCN F1@10% 68.7 # 21
F1@50% 55.0 # 18
Acc 70.4 # 16
Edit 66.2 # 23
F1@25% 65.5 # 21
Action Segmentation GTEA BCN F1@10% 88.5 # 18
F1@50% 77.3 # 14
Acc 79.8 # 12
Edit 84.4 # 16
F1@25% 87.1 # 18

Methods


No methods listed for this paper. Add relevant methods here