Weakly Supervised Temporal Action Localization
31 papers with code • 1 benchmarks • 2 datasets
Libraries
Use these libraries to find Weakly Supervised Temporal Action Localization models and implementationsMost implemented papers
Weakly-Supervised Action Localization by Generative Attention Modeling
By maximizing the conditional probability with respect to the attention, the action and non-action frames are well separated.
D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings and Denoised Activations
The proposed formulation comprises a discriminative and a denoising loss term for enhancing temporal action localization.
A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization
Moreover, our temporal semi-soft and hard attention modules, calculating two attention scores for each video snippet, help to focus on the less discriminative frames of an action to capture the full action boundary.
The Blessings of Unlabeled Background in Untrimmed Videos
The key challenge is how to distinguish the action of interest segments from the background, which is unlabelled even on the video-level.
CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning
In this paper, we argue that learning by comparing helps identify these hard snippets and we propose to utilize snippet Contrastive learning to Localize Actions, CoLA for short.
Cross-modal Consensus Network forWeakly Supervised Temporal Action Localization
In this work, we argue that the features extracted from the pretrained extractor, e. g., I3D, are not the WS-TALtask-specific features, thus the feature re-calibration is needed for reducing the task-irrelevant information redundancy.
Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization
To learn completeness from the obtained sequence, we introduce two novel losses that contrast action instances with background ones in terms of action score and feature similarity, respectively.
Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization
In this paper, we present a framework named FAC-Net based on the I3D backbone, on which three branches are appended, named class-wise foreground classification branch, class-agnostic attention branch and multiple instance learning branch.
Background-Click Supervision for Temporal Action Localization
Weakly supervised temporal action localization aims at learning the instance-level action pattern from the video-level labels, where a significant challenge is action-context confusion.
ACGNet: Action Complement Graph Network for Weakly-supervised Temporal Action Localization
Weakly-supervised temporal action localization (WTAL) in untrimmed videos has emerged as a practical but challenging task since only video-level labels are available.