Search Results for author: Erica Cooper

Found 21 papers, 11 papers with code

Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction

no code implementations • 25 Dec 2023 • Aditya Ravuri, Erica Cooper, Junichi Yamagishi

Predicting audio quality in voice synthesis and conversion systems is a critical yet challenging task, especially when traditional methods like Mean Opinion Scores (MOS) are cumbersome to collect at scale.

Self-Supervised Learning

Paper
Add Code

Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting

1 code implementation • 8 Oct 2023 • Hemant Yadav, Erica Cooper, Junichi Yamagishi, Sunayana Sitaram, Rajiv Ratn Shah

That is the partial rank similarity is measured (PRS) rather than the individual MOS values as with the L1 loss.

Speech Synthesis

Paper
Code

The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains

no code implementations • 4 Oct 2023 • Erica Cooper, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi

We present the second edition of the VoiceMOS Challenge, a scientific event that aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthesized and processed speech.

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

Range-Based Equal Error Rate for Spoof Localization

1 code implementation • 28 May 2023 • Lin Zhang, Xin Wang, Erica Cooper, Nicholas Evans, Junichi Yamagishi

To properly measure misclassified ranges and better evaluate spoof localization performance, we upgrade point-based EER to range-based EER.

Paper
Code

Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms

1 code implementation • Interspeech 2023 • Chang Zeng, Xin Wang, Xiaoxiao Miao, Erica Cooper, Junichi Yamagishi

The ability of countermeasure models to generalize from seen speech synthesis methods to unseen ones has been investigated in the ASVspoof challenge.

Speech Synthesis

Paper
Code

Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech

no code implementations • 17 May 2023 • Erica Cooper, Junichi Yamagishi

Mean Opinion Score (MOS) is a popular measure for evaluating synthesized speech.

Paper
Add Code

Joint Speaker Encoder and Neural Back-end Model for Fully End-to-End Automatic Speaker Verification with Multiple Enrollment Utterances

no code implementations • 1 Sep 2022 • Chang Zeng, Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi

Conventional automatic speaker verification systems can usually be decomposed into a front-end model such as time delay neural network (TDNN) for extracting speaker embeddings and a back-end model such as statistics-based probabilistic linear discriminant analysis (PLDA) or neural network-based neural PLDA (NPLDA) for similarity scoring.

Data Augmentation Speaker Verification

Paper
Add Code

The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance

no code implementations • 11 Apr 2022 • Lin Zhang, Xin Wang, Erica Cooper, Nicholas Evans, Junichi Yamagishi

Since the short spoofed speech segments to be embedded by attackers are of variable length, six different temporal resolutions are considered, ranging from as short as 20 ms to as large as 640 ms. Third, we propose a new CM that enables the simultaneous use of the segment-level labels at different temporal resolutions as well as utterance-level labels to execute utterance- and segment-level detection at the same time.

Speaker Verification Speech Synthesis +2

Paper
Add Code

LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech

1 code implementation • 18 Oct 2021 • Wen-Chin Huang, Erica Cooper, Junichi Yamagishi, Tomoki Toda

An effective approach to automatically predict the subjective rating for synthetic speech is to train on a listening test dataset with human-annotated scores.

Voice Conversion

Paper
Code

Generalization Ability of MOS Prediction Networks

1 code implementation • 6 Oct 2021 • Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi

Automatic methods to predict listener opinions of synthesized speech remain elusive since listeners, systems being evaluated, characteristics of the speech, and even the instructions given and the rating scale all vary from test to test.

Paper
Code

On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis

no code implementations • 4 Oct 2021 • Cheng-I Jeff Lai, Erica Cooper, Yang Zhang, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander H. Liu, Junichi Yamagishi, David Cox, James Glass

Are end-to-end text-to-speech (TTS) models over-parametrized?

Knowledge Distillation Speech Synthesis

Paper
Add Code

Use of speaker recognition approaches for learning and evaluating embedding representations of musical instrument sounds

1 code implementation • 24 Jul 2021 • Xuan Shi, Erica Cooper, Junichi Yamagishi

Constructing an embedding space for musical instrument sounds that can meaningfully represent new and unseen instruments is important for downstream music generation tasks such as multi-instrument synthesis and timbre transfer.

Data Augmentation Instrument Recognition +4

Paper
Code

Exploring Disentanglement with Multilingual and Monolingual VQ-VAE

1 code implementation • 4 May 2021 • Jennifer Williams, Jason Fong, Erica Cooper, Junichi Yamagishi

This work examines the content and usefulness of disentangled phone and speaker representations from two separately trained VQ-VAE systems: one trained on multilingual data and another trained on monolingual data.

Disentanglement

Paper
Code

An Initial Investigation for Detecting Partially Spoofed Audio

no code implementations • 6 Apr 2021 • Lin Zhang, Xin Wang, Erica Cooper, Junichi Yamagishi, Jose Patino, Nicholas Evans

By definition, partially-spoofed utterances contain a mix of both spoofed and bona fide segments, which will likely degrade the performance of countermeasures trained with entirely spoofed utterances.

Voice Anti-spoofing

Paper
Add Code

Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances

1 code implementation • 4 Apr 2021 • Chang Zeng, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi

Probabilistic linear discriminant analysis (PLDA) or cosine similarity have been widely used in traditional speaker verification systems as back-end techniques to measure pairwise similarities.

Ranked #1 on Speaker Verification on CN-CELEB

Speaker Verification

Paper
Code

Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis

no code implementations • 10 Nov 2020 • Erica Cooper, Xin Wang, Yi Zhao, Yusuke Yasuda, Junichi Yamagishi

We explore pretraining strategies including choice of base corpus with the aim of choosing the best strategy for zero-shot multi-speaker end-to-end synthesis.

Speech Synthesis

Paper
Add Code

An Investigation of the Relation Between Grapheme Embeddings and Pronunciation for Tacotron-based Systems

no code implementations • 21 Oct 2020 • Antoine Perquin, Erica Cooper, Junichi Yamagishi

Thanks to this property, we show that grapheme embeddings learned by Tacotron models can be useful for tasks such as grapheme-to-phoneme conversion and control of the pronunciation in synthetic speech.

Relation Speech Synthesis +1

Paper
Add Code

Learning Disentangled Phone and Speaker Representations in a Semi-Supervised VQ-VAE Paradigm

1 code implementation • 21 Oct 2020 • Jennifer Williams, Yi Zhao, Erica Cooper, Junichi Yamagishi

Additionally, phones can be recognized from sub-phone VQ codebook indices in our semi-supervised VQ-VAE better than self-supervised with global conditions.

speaker-diarization Speaker Diarization +1

Paper
Code

Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?

1 code implementation • 4 May 2020 • Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Junichi Yamagishi

This is followed by an analysis on synthesis quality, speaker and dialect similarity, and a remark on the effectiveness of our speaker augmentation approach.

Speech Synthesis

264

Paper
Code

Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings

3 code implementations • 23 Oct 2019 • Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Fuming Fang, Xin Wang, Nanxin Chen, Junichi Yamagishi

While speaker adaptation for end-to-end speech synthesis using speaker embeddings can produce good speaker similarity for speakers seen during training, there remains a gap for zero-shot adaptation to unseen speakers.

Audio and Speech Processing

264

Paper
Code

Babler - Data Collection from the Web to Support Speech Recognition and Keyword Search

no code implementations • WS 2016 • Gideon Mendels, Erica Cooper, Julia Hirschberg

Automatic Speech Recognition (ASR) Language Identification +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.