Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised Audio-Visual Video Parsing

5 Jul 2023 · Jie Fu, Junyu Gao, Changsheng Xu ·

Weakly-supervised audio-visual video parsing (WS-AVVP) aims to localize the temporal extents of audio, visual and audio-visual event instances as well as identify the corresponding event categories with only video-level category labels for training. Most previous methods pay much attention to refining the supervision for each modality or extracting fruitful cross-modality information for more reliable feature learning. None of them have noticed the imbalanced feature learning between different modalities in the task. In this paper, to balance the feature learning processes of different modalities, a dynamic gradient modulation (DGM) mechanism is explored, where a novel and effective metric function is designed to measure the imbalanced feature learning between audio and visual modalities. Furthermore, principle analysis indicates that the multimodal confusing calculation will hamper the precise measurement of multimodal imbalanced feature learning, which further weakens the effectiveness of our DGM mechanism. To cope with this issue, a modality-separated decision unit (MSDU) is designed for more precise measurement of imbalanced feature learning between audio and visual modalities. Comprehensive experiments are conducted on public benchmarks and the corresponding experimental results demonstrate the effectiveness of our proposed method.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

AVE

Results from the Paper

Edit

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

None

Edit Social Preview

Multimodal Imbalance-Aware Gradient Modulation for Weakly-supervised Audio-Visual Video Parsing

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove