Is Weakly-supervised Action Segmentation Ready For Human-Robot Interaction? No, Let's Improve It With Action-union Learning

Action segmentation plays an important role in enabling robots to automatically understand human activities. To train the action recognition model, while obtaining action labels for all frames is costly, annotating timestamp labels for weak supervision is cost-effective. However, existing methods may not fully utilize timestamp labels, which leads to insufficient performance. To alleviate this issue, we proposed a novel learning pattern in our training stage, which maximizes the probability of action union of surrounding timestamps for unlabeled frames. In our inference stage, we provided a new refinement solution to generate better hard-assigned action classes from soft-assigned predictions. Importantly, our methods are model-agnostic and can be applied to existing frameworks. On three commonly used action-segmentation data, our method outperforms previous timestamp-supervision methods and achieves new state-of-the-art performance. Moreover , our method uses less than 1% of fully-supervised labels to obtain comparable or even better results.

PDF

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Weakly Supervised Action Localization GTEA AU-Action mAP@0.1:0.7 76.9 # 1
mAP@0.5 66.3 # 1

Methods


No methods listed for this paper. Add relevant methods here