Search Results for author: Yuchen Hu

Found 29 papers, 14 papers with code

Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?

no code implementations • 19 Apr 2024 • Chengwei Qin, Wenhan Xia, Tan Wang, Fangkai Jiao, Yuchen Hu, Bosheng Ding, Ruirui Chen, Shafiq Joty

One key finding in psychology is that compared with irrelevant past experiences, recalling relevant ones can help humans better handle new tasks.

GSM8K

Paper
Add Code

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

1 code implementation • 10 Feb 2024 • Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Dong Zhang, Zhehuai Chen, Eng Siong Chng

Leveraging the rich linguistic knowledge and strong reasoning abilities of LLMs, our new paradigm can integrate the rich information in N-best candidates to generate a higher-quality translation result.

Machine Translation Translation

150

Paper
Code

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

no code implementations • 8 Feb 2024 • Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, EnSiong Chng, Chao-Han Huck Yang

Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output.

Audio-Visual Speech Recognition Automatic Speech Recognition +3

Paper
Add Code

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

1 code implementation • 19 Jan 2024 • Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Chao Zhang, Pin-Yu Chen, EnSiong Chng

To this end, we propose to extract a language-space noise embedding from the N-best list to represent the noise conditions of source speech, which can promote the denoising process in GER.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Code

Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation

1 code implementation • 7 Jan 2024 • Qiushi Zhu, Jie Zhang, Yu Gu, Yuchen Hu, LiRong Dai

Considering that visual information helps to improve speech recognition performance in noisy scenes, in this work we propose a multichannel multi-modal speech self-supervised learning framework AV-wav2vec2, which utilizes video and multichannel audio data as inputs.

Audio-Visual Speech Recognition Automatic Speech Recognition +7

Paper
Code

Generative error correction for code-switching speech recognition using large language models

no code implementations • 17 Oct 2023 • Chen Chen, Yuchen Hu, Chao-Han Huck Yang, Hexin Liu, Sabato Marco Siniscalchi, Eng Siong Chng

In this work, we propose to leverage large language models (LLMs) and lists of hypotheses generated by an ASR to address the CS problem.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models

1 code implementation • NeurIPS 2023 • Chen Chen, Yuchen Hu, Chao-Han Huck Yang, Sabato Macro Siniscalchi, Pin-Yu Chen, Eng Siong Chng

We make our results publicly accessible for reproducible pipelines with released pre-trained models, thus providing a new evaluation paradigm for ASR error correction with LLMs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Rep2wav: Noise Robust text-to-speech Using self-supervised representations

no code implementations • 28 Aug 2023 • Qiushi Zhu, Yu Gu, Rilin Chen, Chao Weng, Yuchen Hu, LiRong Dai, Jie Zhang

Noise-robust TTS models are often trained using the enhanced speech, which thus suffer from speech distortion and background noise that affect the quality of the synthesized speech.

Speech Enhancement

Paper
Add Code

Noise-aware Speech Enhancement using Diffusion Probabilistic Model

1 code implementation • 16 Jul 2023 • Yuchen Hu, Chen Chen, Ruizhe Li, Qiushi Zhu, Eng Siong Chng

Specifically, we design a noise classification (NC) model to produce acoustic embedding as a noise conditioner for guiding the reverse denoising process.

Denoising Multi-Task Learning +2

Paper
Code

Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition

1 code implementation • 18 Jun 2023 • Yuchen Hu, Ruizhe Li, Chen Chen, Chengwei Qin, Qiushi Zhu, Eng Siong Chng

In this work, we investigate the noise-invariant visual modality to strengthen robustness of AVSR, which can adapt to any testing noises while without dependence on noisy training data, a. k. a., unsupervised noise adaptation.

Audio-Visual Speech Recognition speech-recognition +1

Paper
Code

MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition

1 code implementation • 18 Jun 2023 • Yuchen Hu, Chen Chen, Ruizhe Li, Heqing Zou, Eng Siong Chng

In this paper, we aim to learn the shared representations across modalities to bridge their gap.

Audio-Visual Speech Recognition Representation Learning +3

Paper
Code

A Neural State-Space Model Approach to Efficient Speech Separation

1 code implementation • 26 May 2023 • Chen Chen, Chao-Han Huck Yang, Kai Li, Yuchen Hu, Pin-Jui Ku, Eng Siong Chng

In this work, we introduce S4M, a new efficient speech separation framework based on neural state-space models (SSM).

Representation Learning Speech Separation

Paper
Code

Eeg2vec: Self-Supervised Electroencephalographic Representation Learning

no code implementations • 23 May 2023 • Qiushi Zhu, Xiaoying Zhao, Jie Zhang, Yu Gu, Chao Weng, Yuchen Hu

Recently, many efforts have been made to explore how the brain processes speech using electroencephalographic (EEG) signals, where deep learning-based approaches were shown to be applicable in this field.

EEG Representation Learning

Paper
Add Code

UniS-MMC: Multimodal Classification via Unimodality-supervised Multimodal Contrastive Learning

1 code implementation • 16 May 2023 • Heqing Zou, Meng Shen, Chen Chen, Yuchen Hu, Deepu Rajan, Eng Siong Chng

Multimodal learning aims to imitate human beings to acquire complementary information from multiple modalities for various downstream tasks.

Contrastive Learning Image-text Classification +2

Paper
Code

Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition

1 code implementation • 16 May 2023 • Yuchen Hu, Ruizhe Li, Chen Chen, Heqing Zou, Qiushi Zhu, Eng Siong Chng

However, most existing AVSR approaches simply fuse the audio and visual features by concatenation, without explicit interactions to capture the deep correlations between them, which results in sub-optimal multimodal representations for downstream speech recognition task.

Audio-Visual Speech Recognition Automatic Speech Recognition +3

Paper
Code

Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR

no code implementations • 11 Apr 2023 • Yuchen Hu, Chen Chen, Qiushi Zhu, Eng Siong Chng

Second, during finetuning we propose a Transformer-based code predictor to accurately predict clean codes by modeling the global dependency of input noisy representations, which enables discovery and restoration of high-quality clean representations with reduced distortions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Metric-oriented Speech Enhancement using Diffusion Probabilistic Model

no code implementations • 23 Feb 2023 • Chen Chen, Yuchen Hu, Weiwei Weng, Eng Siong Chng

Deep neural network based speech enhancement technique focuses on learning a noisy-to-clean transformation supervised by paired training data.

Speech Enhancement

Paper
Add Code

Unsupervised Noise adaptation using Data Simulation

no code implementations • 23 Feb 2023 • Chen Chen, Yuchen Hu, Heqing Zou, Linhui Sun, Eng Siong Chng

Deep neural network based speech enhancement approaches aim to learn a noisy-to-clean transformation using a supervised learning paradigm.

Domain Adaptation Generative Adversarial Network +1

Paper
Add Code

Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation

1 code implementation • 22 Feb 2023 • Yuchen Hu, Chen Chen, Heqing Zou, Xionghu Zhong, Eng Siong Chng

To alleviate this problem, we propose a novel network to unify speech enhancement and separation with gradient modulation to improve noise-robustness.

Multi-Task Learning Speech Enhancement +2

Paper
Code

Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition

1 code implementation • 22 Feb 2023 • Yuchen Hu, Chen Chen, Ruizhe Li, Qiushi Zhu, Eng Siong Chng

In this paper, we propose a simple yet effective approach called gradient remedy (GR) to solve interference between task gradients in noise-robust speech recognition, from perspectives of both angle and magnitude.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Code

Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning

no code implementations • 10 Dec 2022 • Chen Chen, Yuchen Hu, Qiang Zhang, Heqing Zou, Beier Zhu, Eng Siong Chng

Audio-visual speech recognition (AVSR) has gained remarkable success for ameliorating the noise-robustness of speech recognition.

Audio-Visual Speech Recognition reinforcement-learning +3

Paper
Add Code

The Second Place Solution for The 4th Large-scale Video Object Segmentation Challenge--Track 3: Referring Video Object Segmentation

no code implementations • 24 Jun 2022 • Leilei Cao, Zhuang Li, Bo Yan, Feng Zhang, Fengliang Qi, Yuchen Hu, Hongbin Wang

The referring video object segmentation task (RVOS) aims to segment object instances in a given video referred by a language expression in all video frames.

Object object-detection +6

Paper
Add Code

Self-critical Sequence Training for Automatic Speech Recognition

no code implementations • 13 Apr 2022 • Chen Chen, Yuchen Hu, Nana Hou, Xiaofeng Qi, Heqing Zou, Eng Siong Chng

Although automatic speech recognition (ASR) task has gained remarkable success by sequence-to-sequence models, there are two main mismatches between its training and testing that might lead to performance degradation: 1) The typically used cross-entropy criterion aims to maximize log-likelihood of the training data, while the performance is evaluated by word error rate (WER), not log-likelihood; 2) The teacher-forcing method leads to the dependence on ground truth during training, which means that model has never been exposed to its own prediction before testing.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning

no code implementations • 29 Mar 2022 • Chen Chen, Nana Hou, Yuchen Hu, Heqing Zou, Xiaofeng Qi, Eng Siong Chng

Automated Audio captioning (AAC) is a cross-modal task that generates natural language to describe the content of input audio.

Audio captioning Contrastive Learning

Paper
Add Code

Noise-robust Speech Recognition with 10 Minutes Unparalleled In-domain Data

no code implementations • 29 Mar 2022 • Chen Chen, Nana Hou, Yuchen Hu, Shashank Shirol, Eng Siong Chng

Noise-robust speech recognition systems require large amounts of training data including noisy speech data and corresponding transcripts to achieve state-of-the-art performances in face of various practical environments.

Generative Adversarial Network Robust Speech Recognition +1

Paper
Add Code

Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition

1 code implementation • 28 Mar 2022 • Yuchen Hu, Nana Hou, Chen Chen, Eng Siong Chng

Then, we propose style learning to map the fused feature close to clean feature, in order to learn latent speech information from the latter, i. e., clean "speech style".

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Off-Policy Evaluation in Partially Observed Markov Decision Processes under Sequential Ignorability

no code implementations • 24 Oct 2021 • Yuchen Hu, Stefan Wager

We consider off-policy evaluation of dynamic treatment rules under sequential ignorability, given an assumption that the underlying system can be modeled as a partially observed Markov decision process (POMDP).

Off-policy evaluation

Paper
Add Code

Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition

2 code implementations • 11 Oct 2021 • Yuchen Hu, Nana Hou, Chen Chen, Eng Siong Chng

Speech enhancement (SE) aims to suppress the additive noise from a noisy speech signal to improve the speech's perceptual quality and intelligibility.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021

no code implementations • ACL (IWSLT) 2021 • Dan Liu, Mengge Du, Xiaoxi Li, Yuchen Hu, LiRong Dai

This paper describes USTC-NELSLIP's submissions to the IWSLT2021 Simultaneous Speech Translation task.

Data Augmentation Translation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.