Search Results for author: Yuchen Hu

Found 29 papers, 14 papers with code

Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?

no code implementations19 Apr 2024 Chengwei Qin, Wenhan Xia, Tan Wang, Fangkai Jiao, Yuchen Hu, Bosheng Ding, Ruirui Chen, Shafiq Joty

One key finding in psychology is that compared with irrelevant past experiences, recalling relevant ones can help humans better handle new tasks.

GSM8K

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

1 code implementation10 Feb 2024 Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Dong Zhang, Zhehuai Chen, Eng Siong Chng

Leveraging the rich linguistic knowledge and strong reasoning abilities of LLMs, our new paradigm can integrate the rich information in N-best candidates to generate a higher-quality translation result.

Machine Translation Translation

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

no code implementations8 Feb 2024 Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, EnSiong Chng, Chao-Han Huck Yang

Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output.

Audio-Visual Speech Recognition Automatic Speech Recognition +3

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

1 code implementation19 Jan 2024 Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Chao Zhang, Pin-Yu Chen, EnSiong Chng

To this end, we propose to extract a language-space noise embedding from the N-best list to represent the noise conditions of source speech, which can promote the denoising process in GER.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation

1 code implementation7 Jan 2024 Qiushi Zhu, Jie Zhang, Yu Gu, Yuchen Hu, LiRong Dai

Considering that visual information helps to improve speech recognition performance in noisy scenes, in this work we propose a multichannel multi-modal speech self-supervised learning framework AV-wav2vec2, which utilizes video and multichannel audio data as inputs.

Audio-Visual Speech Recognition Automatic Speech Recognition +7

HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models

1 code implementation NeurIPS 2023 Chen Chen, Yuchen Hu, Chao-Han Huck Yang, Sabato Macro Siniscalchi, Pin-Yu Chen, Eng Siong Chng

We make our results publicly accessible for reproducible pipelines with released pre-trained models, thus providing a new evaluation paradigm for ASR error correction with LLMs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Rep2wav: Noise Robust text-to-speech Using self-supervised representations

no code implementations28 Aug 2023 Qiushi Zhu, Yu Gu, Rilin Chen, Chao Weng, Yuchen Hu, LiRong Dai, Jie Zhang

Noise-robust TTS models are often trained using the enhanced speech, which thus suffer from speech distortion and background noise that affect the quality of the synthesized speech.

Speech Enhancement

Noise-aware Speech Enhancement using Diffusion Probabilistic Model

1 code implementation16 Jul 2023 Yuchen Hu, Chen Chen, Ruizhe Li, Qiushi Zhu, Eng Siong Chng

Specifically, we design a noise classification (NC) model to produce acoustic embedding as a noise conditioner for guiding the reverse denoising process.

Denoising Multi-Task Learning +2

Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition

1 code implementation18 Jun 2023 Yuchen Hu, Ruizhe Li, Chen Chen, Chengwei Qin, Qiushi Zhu, Eng Siong Chng

In this work, we investigate the noise-invariant visual modality to strengthen robustness of AVSR, which can adapt to any testing noises while without dependence on noisy training data, a. k. a., unsupervised noise adaptation.

Audio-Visual Speech Recognition speech-recognition +1

A Neural State-Space Model Approach to Efficient Speech Separation

1 code implementation26 May 2023 Chen Chen, Chao-Han Huck Yang, Kai Li, Yuchen Hu, Pin-Jui Ku, Eng Siong Chng

In this work, we introduce S4M, a new efficient speech separation framework based on neural state-space models (SSM).

Representation Learning Speech Separation

Eeg2vec: Self-Supervised Electroencephalographic Representation Learning

no code implementations23 May 2023 Qiushi Zhu, Xiaoying Zhao, Jie Zhang, Yu Gu, Chao Weng, Yuchen Hu

Recently, many efforts have been made to explore how the brain processes speech using electroencephalographic (EEG) signals, where deep learning-based approaches were shown to be applicable in this field.

EEG Representation Learning

UniS-MMC: Multimodal Classification via Unimodality-supervised Multimodal Contrastive Learning

1 code implementation16 May 2023 Heqing Zou, Meng Shen, Chen Chen, Yuchen Hu, Deepu Rajan, Eng Siong Chng

Multimodal learning aims to imitate human beings to acquire complementary information from multiple modalities for various downstream tasks.

Contrastive Learning Image-text Classification +2

Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition

1 code implementation16 May 2023 Yuchen Hu, Ruizhe Li, Chen Chen, Heqing Zou, Qiushi Zhu, Eng Siong Chng

However, most existing AVSR approaches simply fuse the audio and visual features by concatenation, without explicit interactions to capture the deep correlations between them, which results in sub-optimal multimodal representations for downstream speech recognition task.

Audio-Visual Speech Recognition Automatic Speech Recognition +3

Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR

no code implementations11 Apr 2023 Yuchen Hu, Chen Chen, Qiushi Zhu, Eng Siong Chng

Second, during finetuning we propose a Transformer-based code predictor to accurately predict clean codes by modeling the global dependency of input noisy representations, which enables discovery and restoration of high-quality clean representations with reduced distortions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Metric-oriented Speech Enhancement using Diffusion Probabilistic Model

no code implementations23 Feb 2023 Chen Chen, Yuchen Hu, Weiwei Weng, Eng Siong Chng

Deep neural network based speech enhancement technique focuses on learning a noisy-to-clean transformation supervised by paired training data.

Speech Enhancement

Unsupervised Noise adaptation using Data Simulation

no code implementations23 Feb 2023 Chen Chen, Yuchen Hu, Heqing Zou, Linhui Sun, Eng Siong Chng

Deep neural network based speech enhancement approaches aim to learn a noisy-to-clean transformation using a supervised learning paradigm.

Domain Adaptation Generative Adversarial Network +1

Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech Separation

1 code implementation22 Feb 2023 Yuchen Hu, Chen Chen, Heqing Zou, Xionghu Zhong, Eng Siong Chng

To alleviate this problem, we propose a novel network to unify speech enhancement and separation with gradient modulation to improve noise-robustness.

Multi-Task Learning Speech Enhancement +2

Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition

1 code implementation22 Feb 2023 Yuchen Hu, Chen Chen, Ruizhe Li, Qiushi Zhu, Eng Siong Chng

In this paper, we propose a simple yet effective approach called gradient remedy (GR) to solve interference between task gradients in noise-robust speech recognition, from perspectives of both angle and magnitude.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

The Second Place Solution for The 4th Large-scale Video Object Segmentation Challenge--Track 3: Referring Video Object Segmentation

no code implementations24 Jun 2022 Leilei Cao, Zhuang Li, Bo Yan, Feng Zhang, Fengliang Qi, Yuchen Hu, Hongbin Wang

The referring video object segmentation task (RVOS) aims to segment object instances in a given video referred by a language expression in all video frames.

Object object-detection +6

Self-critical Sequence Training for Automatic Speech Recognition

no code implementations13 Apr 2022 Chen Chen, Yuchen Hu, Nana Hou, Xiaofeng Qi, Heqing Zou, Eng Siong Chng

Although automatic speech recognition (ASR) task has gained remarkable success by sequence-to-sequence models, there are two main mismatches between its training and testing that might lead to performance degradation: 1) The typically used cross-entropy criterion aims to maximize log-likelihood of the training data, while the performance is evaluated by word error rate (WER), not log-likelihood; 2) The teacher-forcing method leads to the dependence on ground truth during training, which means that model has never been exposed to its own prediction before testing.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning

no code implementations29 Mar 2022 Chen Chen, Nana Hou, Yuchen Hu, Heqing Zou, Xiaofeng Qi, Eng Siong Chng

Automated Audio captioning (AAC) is a cross-modal task that generates natural language to describe the content of input audio.

Audio captioning Contrastive Learning

Noise-robust Speech Recognition with 10 Minutes Unparalleled In-domain Data

no code implementations29 Mar 2022 Chen Chen, Nana Hou, Yuchen Hu, Shashank Shirol, Eng Siong Chng

Noise-robust speech recognition systems require large amounts of training data including noisy speech data and corresponding transcripts to achieve state-of-the-art performances in face of various practical environments.

Generative Adversarial Network Robust Speech Recognition +1

Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition

1 code implementation28 Mar 2022 Yuchen Hu, Nana Hou, Chen Chen, Eng Siong Chng

Then, we propose style learning to map the fused feature close to clean feature, in order to learn latent speech information from the latter, i. e., clean "speech style".

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Off-Policy Evaluation in Partially Observed Markov Decision Processes under Sequential Ignorability

no code implementations24 Oct 2021 Yuchen Hu, Stefan Wager

We consider off-policy evaluation of dynamic treatment rules under sequential ignorability, given an assumption that the underlying system can be modeled as a partially observed Markov decision process (POMDP).

Off-policy evaluation

Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition

2 code implementations11 Oct 2021 Yuchen Hu, Nana Hou, Chen Chen, Eng Siong Chng

Speech enhancement (SE) aims to suppress the additive noise from a noisy speech signal to improve the speech's perceptual quality and intelligibility.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Cannot find the paper you are looking for? You can Submit a new open access paper.