Search Results for author: Heqing Zou

Found 10 papers, 6 papers with code

MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition

1 code implementation • 18 Jun 2023 • Yuchen Hu, Chen Chen, Ruizhe Li, Heqing Zou, Eng Siong Chng

In this paper, we aim to learn the shared representations across modalities to bridge their gap.

Audio-Visual Speech Recognition Representation Learning +3

Paper
Code

Towards Balanced Active Learning for Multimodal Classification

1 code implementation • 14 Jun 2023 • Meng Shen, Yizheng Huang, Jianxiong Yin, Heqing Zou, Deepu Rajan, Simon See

Our studies demonstrate that the proposed method achieves more balanced multimodal learning by avoiding greedy sample selection from the dominant modality.

Active Learning Classification +1

Paper
Code

UniS-MMC: Multimodal Classification via Unimodality-supervised Multimodal Contrastive Learning

1 code implementation • 16 May 2023 • Heqing Zou, Meng Shen, Chen Chen, Yuchen Hu, Deepu Rajan, Eng Siong Chng

Multimodal learning aims to imitate human beings to acquire complementary information from multiple modalities for various downstream tasks.

Contrastive Learning Image-text Classification +2

Paper
Code

Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition

1 code implementation • 16 May 2023 • Yuchen Hu, Ruizhe Li, Chen Chen, Heqing Zou, Qiushi Zhu, Eng Siong Chng

However, most existing AVSR approaches simply fuse the audio and visual features by concatenation, without explicit interactions to capture the deep correlations between them, which results in sub-optimal multimodal representations for downstream speech recognition task.

Audio-Visual Speech Recognition Automatic Speech Recognition +3

Paper
Code

Unsupervised Noise adaptation using Data Simulation

no code implementations • 23 Feb 2023 • Chen Chen, Yuchen Hu, Heqing Zou, Linhui Sun, Eng Siong Chng

Deep neural network based speech enhancement approaches aim to learn a noisy-to-clean transformation using a supervised learning paradigm.

Domain Adaptation Generative Adversarial Network +1

Paper
Add Code

Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation

1 code implementation • 22 Feb 2023 • Yuchen Hu, Chen Chen, Heqing Zou, Xionghu Zhong, Eng Siong Chng

To alleviate this problem, we propose a novel network to unify speech enhancement and separation with gradient modulation to improve noise-robustness.

Multi-Task Learning Speech Enhancement +2

Paper
Code

Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning

no code implementations • 10 Dec 2022 • Chen Chen, Yuchen Hu, Qiang Zhang, Heqing Zou, Beier Zhu, Eng Siong Chng

Audio-visual speech recognition (AVSR) has gained remarkable success for ameliorating the noise-robustness of speech recognition.

Audio-Visual Speech Recognition reinforcement-learning +3

Paper
Add Code

Self-critical Sequence Training for Automatic Speech Recognition

no code implementations • 13 Apr 2022 • Chen Chen, Yuchen Hu, Nana Hou, Xiaofeng Qi, Heqing Zou, Eng Siong Chng

Although automatic speech recognition (ASR) task has gained remarkable success by sequence-to-sequence models, there are two main mismatches between its training and testing that might lead to performance degradation: 1) The typically used cross-entropy criterion aims to maximize log-likelihood of the training data, while the performance is evaluated by word error rate (WER), not log-likelihood; 2) The teacher-forcing method leads to the dependence on ground truth during training, which means that model has never been exposed to its own prediction before testing.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Speech Emotion Recognition with Co-Attention based Multi-level Acoustic Information

1 code implementation • 29 Mar 2022 • Heqing Zou, Yuke Si, Chen Chen, Deepu Rajan, Eng Siong Chng

In this paper, we propose an end-to-end speech emotion recognition system using multi-level acoustic information with a newly designed co-attention module.

Speech Emotion Recognition

103

Paper
Code

Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning

no code implementations • 29 Mar 2022 • Chen Chen, Nana Hou, Yuchen Hu, Heqing Zou, Xiaofeng Qi, Eng Siong Chng

Automated Audio captioning (AAC) is a cross-modal task that generates natural language to describe the content of input audio.

Audio captioning Contrastive Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.