Search Results for author: Kaizhi Qian

Found 18 papers, 11 papers with code

Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling

1 code implementation • 15 Nov 2023 • Bairu Hou, Yujian Liu, Kaizhi Qian, Jacob Andreas, Shiyu Chang, Yang Zhang

Uncertainty decomposition refers to the task of decomposing the total uncertainty of a model into data (aleatoric) uncertainty, resulting from the inherent complexity or ambiguity of the data, and model (epistemic) uncertainty, resulting from the lack of knowledge in the model.

Uncertainty Quantification

Paper
Code

Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning

no code implementations • 23 Jun 2023 • Zhongzhi Yu, Yang Zhang, Kaizhi Qian, Yonggan Fu, Yingyan Lin

Despite the impressive performance recently achieved by automatic speech recognition (ASR), we observe two primary challenges that hinder its broader applications: (1) The difficulty of introducing scalability into the model to support more languages with limited training, inference, and storage overhead; (2) The low-resource adaptation ability that enables effective low-resource adaptation while avoiding over-fitting and catastrophic forgetting issues.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos

no code implementations • CVPR 2023 • Kun Su, Kaizhi Qian, Eli Shlizerman, Antonio Torralba, Chuang Gan

Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound.

Paper
Add Code

Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing

1 code implementation • 2 Nov 2022 • Yonggan Fu, Yang Zhang, Kaizhi Qian, Zhifan Ye, Zhongzhi Yu, Cheng-I Lai, Yingyan Lin

We believe S$^3$-Router has provided a new perspective for practical deployment of speech SSL models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

1 code implementation • 20 Apr 2022 • Kaizhi Qian, Yang Zhang, Heting Gao, Junrui Ni, Cheng-I Lai, David Cox, Mark Hasegawa-Johnson, Shiyu Chang

Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks.

Disentanglement Self-Supervised Learning

414

Paper
Code

WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models

1 code implementation • 29 Mar 2022 • Heting Gao, Junrui Ni, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson

We show that WavPrompt is a few-shot learner that can perform speech understanding tasks better than a naive text baseline.

Few-Shot Learning Language Modelling +1

Paper
Code

Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition

1 code implementation • 29 Mar 2022 • Junrui Ni, Liming Wang, Heting Gao, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson

An unsupervised text-to-speech synthesis (TTS) system learns to generate speech waveforms corresponding to any written sentence in a language by observing: 1) a collection of untranscribed speech waveforms in that language; 2) a collection of texts written in that language without access to any transcribed speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Code

SpeechSplit 2.0: Unsupervised speech disentanglement for voice conversion Without tuning autoencoder Bottlenecks

1 code implementation • 26 Mar 2022 • Chak Ho Chan, Kaizhi Qian, Yang Zhang, Mark Hasegawa-Johnson

SpeechSplit can perform aspect-specific voice conversion by disentangling speech into content, rhythm, pitch, and timbre using multiple autoencoders in an unsupervised manner.

Disentanglement Voice Conversion

122

Paper
Code

On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis

no code implementations • 4 Oct 2021 • Cheng-I Jeff Lai, Erica Cooper, Yang Zhang, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander H. Liu, Junichi Yamagishi, David Cox, James Glass

Are end-to-end text-to-speech (TTS) models over-parametrized?

Knowledge Distillation Speech Synthesis

Paper
Add Code

Global Rhythm Style Transfer Without Text Transcriptions

1 code implementation • 16 Jun 2021 • Kaizhi Qian, Yang Zhang, Shiyu Chang, JinJun Xiong, Chuang Gan, David Cox, Mark Hasegawa-Johnson

In this paper, we propose AutoPST, which can disentangle global prosody style from speech without relying on any text transcriptions.

Representation Learning Style Transfer

248

Paper
Code

PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition

no code implementations • NeurIPS 2021 • Cheng-I Jeff Lai, Yang Zhang, Alexander H. Liu, Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David Cox, James Glass

We investigate the existence of sparse subnetworks in pre-trained speech SSL models that achieve even better low-resource ASR results.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Speech Denoising with Auditory Models

1 code implementation • 21 Nov 2020 • Mark R. Saddler, Andrew Francl, Jenelle Feather, Kaizhi Qian, Yang Zhang, Josh H. McDermott

Contemporary speech enhancement predominantly relies on audio transforms that are trained to reconstruct a clean speech waveform.

Denoising Speech Denoising +1

Paper
Code

Unsupervised Speech Decomposition via Triple Information Bottleneck

6 code implementations • ICML 2020 • Kaizhi Qian, Yang Zhang, Shiyu Chang, David Cox, Mark Hasegawa-Johnson

Speech information can be roughly decomposed into four components: language content, timbre, pitch, and rhythm.

Style Transfer Voice Conversion

963

Paper
Code

F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder

1 code implementation • 15 Apr 2020 • Kaizhi Qian, Zeyu Jin, Mark Hasegawa-Johnson, Gautham J. Mysore

Recently, AutoVC, a conditional autoencoders (CAEs) based method achieved state-of-the-art results by disentangling the speaker identity and speech content using information-constraining bottlenecks, and it achieves zero-shot conversion by swapping in a different speaker's identity embedding to synthesize a new voice.

Style Transfer Voice Conversion

Paper
Code

An Efficient and Margin-Approaching Zero-Confidence Adversarial Attack

no code implementations • ICLR 2019 • Yang Zhang, Shiyu Chang, Mo Yu, Kaizhi Qian

The second paradigm, called the zero-confidence attack, finds the smallest perturbation needed to cause mis-classification, also known as the margin of an input feature.

Adversarial Attack

Paper
Add Code

Continuous Convolutional Neural Network forNonuniform Time Series

no code implementations • 25 Sep 2019 • Hui Shi, Yang Zhang, Hao Wu, Shiyu Chang, Kaizhi Qian, Mark Hasegawa-Johnson, Jishen Zhao

Convolutional neural network (CNN) for time series data implicitly assumes that the data are uniformly sampled, whereas many event-based and multi-modal data are nonuniform or have heterogeneous sampling rates.

Time Series Time Series Analysis

Paper
Add Code

AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

11 code implementations • 14 May 2019 • Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Mark Hasegawa-Johnson

On the other hand, CVAE training is simple but does not come with the distribution-matching property of a GAN.

Style Transfer Voice Conversion

963

Paper
Code

Deep Learning Based Speech Beamforming

no code implementations • 15 Feb 2018 • Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei Florencio, Mark Hasegawa-Johnson

On the other hand, deep learning based enhancement approaches are able to learn complicated speech distributions and perform efficient inference, but they are unable to deal with variable number of input channels.

Speech Enhancement

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.