Search Results for author: Chao-Han Huck Yang

Found 64 papers, 32 papers with code

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

1 code implementation10 Feb 2024 Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Dong Zhang, Zhehuai Chen, Eng Siong Chng

Leveraging the rich linguistic knowledge and strong reasoning abilities of LLMs, our new paradigm can integrate the rich information in N-best candidates to generate a higher-quality translation result.

Machine Translation Translation

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

no code implementations8 Feb 2024 Chen Chen, Ruizhe Li, Yuchen Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, EnSiong Chng, Chao-Han Huck Yang

Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output.

Audio-Visual Speech Recognition Automatic Speech Recognition +3

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

1 code implementation19 Jan 2024 Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Chao Zhang, Pin-Yu Chen, EnSiong Chng

To this end, we propose to extract a language-space noise embedding from the N-best list to represent the noise conditions of source speech, which can promote the denoising process in GER.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Investigating Training Strategies and Model Robustness of Low-Rank Adaptation for Language Modeling in Speech Recognition

no code implementations19 Jan 2024 Yu Yu, Chao-Han Huck Yang, Tuan Dinh, Sungho Ryu, Jari Kolehmainen, Roger Ren, Denis Filimonov, Prashanth G. Shivakumar, Ankur Gandhe, Ariya Rastow, Jia Xu, Ivan Bulyko, Andreas Stolcke

The use of low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware.

Language Modelling speech-recognition +1

Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue

no code implementations23 Dec 2023 Guan-Ting Lin, Prashanth Gurunath Shivakumar, Ankur Gandhe, Chao-Han Huck Yang, Yile Gu, Shalini Ghosh, Andreas Stolcke, Hung-Yi Lee, Ivan Bulyko

Specifically, our framework serializes tasks in the order of current paralinguistic attribute prediction, response paralinguistic attribute prediction, and response text generation with autoregressive conditioning.

Attribute Language Modelling +4

Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification

no code implementations22 Dec 2023 Anirudh S. Sundar, Chao-Han Huck Yang, David M. Chan, Shalini Ghosh, Venkatesh Ravichandran, Phani Sankar Nidadavolu

In cases where some data/compute is available, we present Learnable-MAM, a data-driven approach to merging attention matrices, resulting in a further 2. 90% relative reduction in WER for ASR and 18. 42% relative reduction in AEC compared to fine-tuning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Conditional Modeling Based Automatic Video Summarization

no code implementations20 Nov 2023 Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hung Chen, Marcel Worring

The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story.

Video Summarization

HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models

1 code implementation NeurIPS 2023 Chen Chen, Yuchen Hu, Chao-Han Huck Yang, Sabato Macro Siniscalchi, Pin-Yu Chen, Eng Siong Chng

We make our results publicly accessible for reproducible pipelines with released pre-trained models, thus providing a new evaluation paradigm for ASR error correction with LLMs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting

no code implementations27 Sep 2023 Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke

We explore the ability of large language models (LLMs) to act as speech recognition post-processors that perform rescoring and error correction.

Ranked #3 on Speech Recognition on WSJ eval92 (using extra training data)

In-Context Learning speech-recognition +1

Can Whisper perform speech-based in-context learning?

no code implementations13 Sep 2023 Siyin Wang, Chao-Han Huck Yang, Ji Wu, Chao Zhang

Language-level adaptation experiments using Chinese dialects showed that when applying SICL to isolated word ASR, consistent and considerable relative WER reductions can be achieved using Whisper models of any size on two dialects, which is on average 32. 3%.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

How to Estimate Model Transferability of Pre-Trained Speech Models?

1 code implementation1 Jun 2023 Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-Yi Lee, Tara N. Sainath

In this work, we introduce a "score-based assessment" framework for estimating the transferability of pre-trained speech models (PSMs) for fine-tuning target tasks.

A Neural State-Space Model Approach to Efficient Speech Separation

1 code implementation26 May 2023 Chen Chen, Chao-Han Huck Yang, Kai Li, Yuchen Hu, Pin-Jui Ku, Eng Siong Chng

In this work, we introduce S4M, a new efficient speech separation framework based on neural state-space models (SSM).

Representation Learning Speech Separation

Differentially Private Adapters for Parameter Efficient Acoustic Modeling

1 code implementation19 May 2023 Chun-Wei Ho, Chao-Han Huck Yang, Sabato Marco Siniscalchi

Evaluated on the open-access Multilingual Spoken Words (MLSW) dataset, our solution reduces the number of trainable parameters by 97. 5% using the RAs with only a 4% performance drop with respect to fine-tuning the cross-lingual speech classifier while preserving DP guarantees.

Parameter-Efficient Learning for Text-to-Speech Accent Adaptation

1 code implementation18 May 2023 Li-Jen Yang, Chao-Han Huck Yang, Jen-Tzung Chien

This paper presents a parameter-efficient learning (PEL) to develop a low-resource accent adaptation for text-to-speech (TTS).

A Parameter-Efficient Learning Approach to Arabic Dialect Identification with Pre-Trained General-Purpose Speech Model

1 code implementation18 May 2023 Srijith Radhakrishnan, Chao-Han Huck Yang, Sumeer Ahmad Khan, Narsis A. Kiani, David Gomez-Cabrero, Jesper N. Tegner

In this work, we explore Parameter-Efficient-Learning (PEL) techniques to repurpose a General-Purpose-Speech (GSM) model for Arabic dialect identification (ADI).

Dialect Identification

Pre-training Tensor-Train Networks Facilitates Machine Learning with Variational Quantum Circuits

no code implementations18 May 2023 Jun Qi, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hsiu Hsieh

Variational quantum circuit (VQC) is a promising approach for implementing quantum neural networks on noisy intermediate-scale quantum (NISQ) devices.

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition

no code implementations19 Jan 2023 Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Rohit Prabhavalkar, Tara N. Sainath, Trevor Strohman

In this work, we propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition, which can \textbf{re-purpose} well-trained English automatic speech recognition (ASR) models to recognize the other languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Certified Robustness of Quantum Classifiers against Adversarial Examples through Quantum Noise

no code implementations2 Nov 2022 Jhih-Cing Huang, Yu-Lin Tsai, Chao-Han Huck Yang, Cheng-Fang Su, Chia-Mu Yu, Pin-Yu Chen, Sy-Yen Kuo

Recently, quantum classifiers have been found to be vulnerable to adversarial attacks, in which quantum classifiers are deceived by imperceptible noises, leading to misclassification.

A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition

no code implementations2 Nov 2022 Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Tara N. Sainath, Sabato Marco Siniscalchi, Chin-Hui Lee

We propose a quantum kernel learning (QKL) framework to address the inherent data sparsity issues often encountered in training large-scare acoustic models in low-resource scenarios.

Spoken Command Recognition

Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming

1 code implementation2 Nov 2022 Yun-Ning Hung, Chao-Han Huck Yang, Pin-Yu Chen, Alexander Lerch

In this work, we introduce a novel method for leveraging pre-trained models for low-resource (music) classification based on the concept of Neural Model Reprogramming (NMR).

Classification Genre classification +3

Inference and Denoise: Causal Inference-based Neural Speech Enhancement

1 code implementation2 Nov 2022 Tsun-An Hsieh, Chao-Han Huck Yang, Pin-Yu Chen, Sabato Marco Siniscalchi, Yu Tsao

This study addresses the speech enhancement (SE) task within the causal inference paradigm by modeling the noise presence as an intervention.

Causal Inference Speech Enhancement

An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling to Differential Privacy Preserving Speech Recognition

no code implementations12 Oct 2022 Chao-Han Huck Yang, Jun Qi, Sabato Marco Siniscalchi, Chin-Hui Lee

We propose an ensemble learning framework with Poisson sub-sampling to effectively train a collection of teacher models to issue some differential privacy (DP) guarantee for training data.

Ensemble Learning Privacy Preserving +3

Theoretical Error Performance Analysis for Variational Quantum Circuit Based Functional Regression

1 code implementation8 Jun 2022 Jun Qi, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hsiu Hsieh

In this work, we first put forth an end-to-end quantum neural network, TTN-VQC, which consists of a quantum tensor network based on a tensor-train network (TTN) for dimensionality reduction and a VQC for functional regression.

Dimensionality Reduction regression

Treatment Learning Causal Transformer for Noisy Image Classification

no code implementations29 Mar 2022 Chao-Han Huck Yang, I-Te Danny Hung, Yi-Chieh Liu, Pin-Yu Chen

In this work, we incorporate this binary information of "existence of noise" as treatment into image classification tasks to improve prediction accuracy by jointly estimating their treatment effects.

Benchmarking Classification +3

A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning

no code implementations17 Feb 2022 Hengshun Zhou, Jun Du, Chao-Han Huck Yang, Shifu Xiong, Chin-Hui Lee

Audio-only-based wake word spotting (WWS) is challenging under noisy conditions due to environmental interference in signal transmission.

Network Pruning

When BERT Meets Quantum Temporal Convolution Learning for Text Classification in Heterogeneous Computing

no code implementations17 Feb 2022 Chao-Han Huck Yang, Jun Qi, Samuel Yen-Chi Chen, Yu Tsao, Pin-Yu Chen

Our experiments on intent classification show that our proposed BERT-QTC model attains competitive experimental results in the Snips and ATIS spoken language datasets.

intent-classification Intent Classification +4

Pessimistic Model Selection for Offline Deep Reinforcement Learning

no code implementations29 Nov 2021 Chao-Han Huck Yang, Zhengling Qi, Yifan Cui, Pin-Yu Chen

Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications.

Decision Making Model Selection +2

A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer

1 code implementation16 Oct 2021 Hu Hu, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Chin-Hui Lee

We propose a variational Bayesian (VB) approach to learning distributions of latent variables in deep neural network (DNN) models for cross-domain knowledge transfer, to address acoustic mismatches between training and testing conditions.

Acoustic Scene Classification Scene Classification +1

Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition

1 code implementation8 Oct 2021 Hao Yen, Pin-Jui Ku, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Yu Tsao

In this study, we propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR), and build an AR-SCR system.

Spoken Command Recognition Transfer Learning

QTN-VQC: An End-to-End Learning framework for Quantum Neural Networks

no code implementations6 Oct 2021 Jun Qi, Chao-Han Huck Yang, Pin-Yu Chen

The advent of noisy intermediate-scale quantum (NISQ) computers raises a crucial challenge to design quantum neural networks for fully quantum learning tasks.

Longer Version for "Deep Context-Encoding Network for Retinal Image Captioning"

no code implementations30 May 2021 Jia-Hong Huang, Ting-Wei Wu, Chao-Han Huck Yang, Marcel Worring

Automatically generating medical reports for retinal images is one of the promising ways to help ophthalmologists reduce their workload and improve work efficiency.

Avg Image Captioning +1

PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification

no code implementations2 Apr 2021 Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

We propose using an adversarial autoencoder (AAE) to replace generative adversarial network (GAN) in the private aggregation of teacher ensembles (PATE), a solution for ensuring differential privacy in speech applications.

Ranked #3 on Keyword Spotting on Google Speech Commands (10-keyword Speech Commands dataset metric)

Generative Adversarial Network Keyword Spotting +1

Training a Resilient Q-Network against Observational Interference

1 code implementation18 Feb 2021 Chao-Han Huck Yang, I-Te Danny Hung, Yi Ouyang, Pin-Yu Chen

Deep reinforcement learning (DRL) has demonstrated impressive performance in various gaming simulators and real-world applications.

Causal Inference

Multi-task Language Modeling for Improving Speech Recognition of Rare Words

no code implementations23 Nov 2020 Chao-Han Huck Yang, Linda Liu, Ankur Gandhe, Yile Gu, Anirudh Raju, Denis Filimonov, Ivan Bulyko

We show that our rescoring model trained with these additional tasks outperforms the baseline rescoring model, trained with only the language modeling task, by 1. 4% on a general test and by 2. 6% on a rare word test set in terms of word-error-rate relative (WERR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

A Two-Stage Approach to Device-Robust Acoustic Scene Classification

1 code implementation3 Nov 2020 Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian Wang, Shutong Niu, Li Chai, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du, Chin-Hui Lee

To improve device robustness, a highly desirable key feature of a competitive data-driven acoustic scene classification (ASC) system, a novel two-stage system based on fully convolutional neural networks (CNNs) is proposed.

Acoustic Scene Classification Classification +4

Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition

2 code implementations26 Oct 2020 Chao-Han Huck Yang, Jun Qi, Samuel Yen-Chi Chen, Pin-Yu Chen, Sabato Marco Siniscalchi, Xiaoli Ma, Chin-Hui Lee

Testing on the Google Speech Commands Dataset, the proposed QCNN encoder attains a competitive accuracy of 95. 12% in a decentralized model, which is better than the previous architectures using centralized RNN models with convolutional features.

 Ranked #1 on Keyword Spotting on Google Speech Commands (10-keyword Speech Commands dataset metric)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement

2 code implementations25 Jul 2020 Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

Finally, our experiments of multi-channel speech enhancement on a simulated noisy WSJ0 corpus demonstrate that our proposed hybrid CNN-TT architecture achieves better results than both DNN and CNN models in terms of better-enhanced speech qualities and smaller parameter sizes.

regression Speech Enhancement

Wavelet Channel Attention Module with a Fusion Network for Single Image Deraining

no code implementations17 Jul 2020 Hao-Hsiang Yang, Chao-Han Huck Yang, Yu-Chiang Frank Wang

Wavelet transform and the inverse wavelet transform are substituted for down-sampling and up-sampling so feature maps from the wavelet transform and convolutions contain different frequencies and scales.

Single Image Deraining

Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement

no code implementations31 Mar 2020 Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Xiaoli Ma, Chin-Hui Lee

Recent studies have highlighted adversarial examples as ubiquitous threats to the deep neural network (DNN) based speech recognition systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Y-net: Multi-scale feature aggregation network with wavelet structure similarity loss function for single image dehazing

1 code implementation31 Mar 2020 Hao-Hsiang Yang, Chao-Han Huck Yang, Yi-Chang James Tsai

Extensive experimental results demonstrate that the proposed Y-net with the W-SSIM loss function restores high-quality clear images and outperforms state-of-the-art algorithms.

Image Dehazing Single Image Dehazing +2

Enhanced Adversarial Strategically-Timed Attacks against Deep Reinforcement Learning

no code implementations20 Feb 2020 Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Yi Ouyang, I-Te Danny Hung, Chin-Hui Lee, Xiaoli Ma

Recent deep neural networks based techniques, especially those equipped with the ability of self-adaptation in the system level such as deep reinforcement learning (DRL), are shown to possess many advantages of optimizing robot learning systems (e. g., autonomous navigation and continuous robot arm control.)

Autonomous Navigation reinforcement-learning +1

Tensor-to-Vector Regression for Multi-channel Speech Enhancement based on Tensor-Train Network

2 code implementations3 Feb 2020 Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee

Finally, in 8-channel conditions, a PESQ of 3. 12 is achieved using 20 million parameters for TTN, whereas a DNN with 68 million parameters can only attain a PESQ of 3. 06.

regression Speech Enhancement

Evolving Neural Networks through a Reverse Encoding Tree

1 code implementation3 Feb 2020 Haoling Zhang, Chao-Han Huck Yang, Hector Zenil, Narsis A. Kiani, Yue Shen, Jesper N. Tegner

Using RET, two types of approaches -- NEAT with Binary search encoding (Bi-NEAT) and NEAT with Golden-Section search encoding (GS-NEAT) -- have been designed to solve problems in benchmark continuous learning environments such as logic gates, Cartpole, and Lunar Lander, and tested against classical NEAT and FS-NEAT as baselines.

Variational Quantum Circuits for Deep Reinforcement Learning

1 code implementation30 Jun 2019 Samuel Yen-Chi Chen, Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Xiaoli Ma, Hsi-Sheng Goan

To the best of our knowledge, this work is the first proof-of-principle demonstration of variational quantum circuits to approximate the deep $Q$-value function for decision-making and policy-selection reinforcement learning with experience replay and target network.

BIG-bench Machine Learning Decision Making +3

When Causal Intervention Meets Adversarial Examples and Image Masking for Deep Neural Networks

1 code implementation9 Feb 2019 Chao-Han Huck Yang, Yi-Chieh Liu, Pin-Yu Chen, Xiaoli Ma, Yi-Chang James Tsai

To study the intervention effects on pixel-level features for causal reasoning, we introduce pixel-wise masking and adversarial perturbation.

Causal Inference Visual Reasoning

Controllability, Multiplexing, and Transfer Learning in Networks using Evolutionary Learning

1 code implementation14 Nov 2018 Rise Ooi, Chao-Han Huck Yang, Pin-Yu Chen, Vìctor Eguìluz, Narsis Kiani, Hector Zenil, David Gomez-Cabrero, Jesper Tegnèr

Next, (2) the learned networks are technically controllable as only a small number of driver nodes are required to move the system to a new state.

Transfer Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.