Search Results for author: Cheng-I Lai

Found 13 papers, 9 papers with code

Cascading and Direct Approaches to Unsupervised Constituency Parsing on Spoken Sentences

1 code implementation15 Mar 2023 Yuan Tseng, Cheng-I Lai, Hung-Yi Lee

The goal is to determine the spoken sentences' hierarchical syntactic structure in the form of constituency parse trees, such that each node is a span of audio that corresponds to a constituent.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

1 code implementation20 Apr 2022 Kaizhi Qian, Yang Zhang, Heting Gao, Junrui Ni, Cheng-I Lai, David Cox, Mark Hasegawa-Johnson, Shiyu Chang

Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks.

Disentanglement Self-Supervised Learning

Towards Semi-Supervised Semantics Understanding from Speech

no code implementations11 Nov 2020 Cheng-I Lai, Jin Cao, Sravan Bodapati, Shang-Wen Li

Much recent work on Spoken Language Understanding (SLU) falls short in at least one of three ways: models were trained on oracle text input and neglected the Automatics Speech Recognition (ASR) outputs, models were trained to predict only intents without the slot values, or models were trained on a large amount of in-house data.

speech-recognition Speech Recognition +1

Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining

1 code implementation26 Oct 2020 Cheng-I Lai, Yung-Sung Chuang, Hung-Yi Lee, Shang-Wen Li, James Glass

Much recent work on Spoken Language Understanding (SLU) is limited in at least one of three ways: models were trained on oracle text input and neglected ASR errors, models were trained to predict only intents without the slot values, or models were trained on a large amount of in-house data.

Language Modelling Spoken Language Understanding

Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?

1 code implementation4 May 2020 Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Junichi Yamagishi

This is followed by an analysis on synthesis quality, speaker and dialect similarity, and a remark on the effectiveness of our speaker augmentation approach.

Speech Synthesis

Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings

3 code implementations23 Oct 2019 Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Fuming Fang, Xin Wang, Nanxin Chen, Junichi Yamagishi

While speaker adaptation for end-to-end speech synthesis using speaker embeddings can produce good speaker similarity for speakers seen during training, there remains a gap for zero-shot adaptation to unseen speakers.

Audio and Speech Processing

Contrastive Predictive Coding Based Feature for Automatic Speaker Verification

1 code implementation1 Apr 2019 Cheng-I Lai

This thesis describes our ongoing work on Contrastive Predictive Coding (CPC) features for speaker verification.

Representation Learning Speaker Verification

ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks

1 code implementation1 Apr 2019 Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak

We present JHU's system submission to the ASVspoof 2019 Challenge: Anti-Spoofing with Squeeze-Excitation and Residual neTworks (ASSERT).

Feature Engineering Voice Conversion

Attentive Filtering Networks for Audio Replay Attack Detection

1 code implementation31 Oct 2018 Cheng-I Lai, Alberto Abad, Korin Richmond, Junichi Yamagishi, Najim Dehak, Simon King

In this work, we propose our replay attacks detection system - Attentive Filtering Network, which is composed of an attention-based filtering mechanism that enhances feature representations in both the frequency and time domains, and a ResNet-based classifier.

Speaker Verification

Cannot find the paper you are looking for? You can Submit a new open access paper.