Search Results for author: Roberto Barra-Chicote

Found 24 papers, 3 papers with code

Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations

no code implementations5 Feb 2024 Álvaro Martín-Cortinas, Daniel Sáez-Trigueros, Iván Vallés-Pérez, Biel Tura-Vecino, Piotr Biliński, Mateusz Lajszczak, Grzegorz Beringer, Roberto Barra-Chicote, Jaime Lorenzo-Trueba

Using speaker-disentangled codes to train LLMs for text-to-speech (TTS) allows the LLM to generate the content and the style of the speech only from the text, similarly to humans, while the speaker identity is provided by the decoder of the VC model.

In-Context Learning Voice Conversion

Creating New Voices using Normalizing Flows

no code implementations22 Dec 2023 Piotr Bilinski, Thomas Merritt, Abdelhamid Ezzerg, Kamil Pokora, Sebastian Cygert, Kayoko Yanagisawa, Roberto Barra-Chicote, Daniel Korzekwa

As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities.

Speech Synthesis Voice Conversion

SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces

no code implementations23 Jul 2023 Ivan Vallés-Pérez, Grzegorz Beringer, Piotr Bilinski, Gary Cook, Roberto Barra-Chicote

We train a CLIP-based model with the aim to learn shared representations of phonetic and acoustic spaces.

Remap, warp and attend: Non-parallel many-to-many accent conversion with Normalizing Flows

no code implementations10 Nov 2022 Abdelhamid Ezzerg, Thomas Merritt, Kayoko Yanagisawa, Piotr Bilinski, Magdalena Proszewska, Kamil Pokora, Renard Korzeniowski, Roberto Barra-Chicote, Daniel Korzekwa

Regional accents of the same language affect not only how words are pronounced (i. e., phonetic content), but also impact prosodic aspects of speech such as speaking rate and intonation.

Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech

no code implementations4 Nov 2022 Xin Zhang, Iván Vallés-Pérez, Andreas Stolcke, Chengzhu Yu, Jasha Droppo, Olabanji Shonibare, Roberto Barra-Chicote, Venkatesh Ravichandran

By fine-tuning an ASR model on synthetic stuttered speech we are able to reduce word error by 5. 7% relative on stuttered utterances, with only minor (<0. 2% relative) degradation for fluent utterances.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion

no code implementations4 Jul 2022 Magdalena Proszewska, Grzegorz Beringer, Daniel Sáez-Trigueros, Thomas Merritt, Abdelhamid Ezzerg, Roberto Barra-Chicote

We evaluate our models in terms of intelligibility, speaker similarity and naturalness for intra- and cross-lingual conversion in seen and unseen languages.

Voice Conversion

Prosodic Alignment for off-screen automatic dubbing

no code implementations6 Apr 2022 Yogesh Virkar, Marcello Federico, Robert Enyedi, Roberto Barra-Chicote

The goal of automatic dubbing is to perform speech-to-speech translation while achieving audiovisual coherence.

Speech-to-Speech Translation Translation

Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

no code implementations16 Feb 2022 Adam Gabryś, Goeric Huybrechts, Manuel Sam Ribeiro, Chung-Ming Chien, Julian Roth, Giulia Comini, Roberto Barra-Chicote, Bartek Perz, Jaime Lorenzo-Trueba

It uses voice conversion (VC) as a post-processing module appended to a pre-existing high-quality TTS system and marks a conceptual shift in the existing TTS paradigm, framing the few-shot TTS problem as a VC task.

Speech Synthesis Voice Conversion

Machine Translation Verbosity Control for Automatic Dubbing

no code implementations8 Oct 2021 Surafel M. Lakew, Marcello Federico, Yue Wang, Cuong Hoang, Yogesh Virkar, Roberto Barra-Chicote, Robert Enyedi

Automatic dubbing aims at seamlessly replacing the speech in a video document with synthetic speech in a different language.

Machine Translation Translation

Improving the expressiveness of neural vocoding with non-affine Normalizing Flows

no code implementations16 Jun 2021 Adam Gabryś, Yunlong Jiao, Viacheslav Klimkov, Daniel Korzekwa, Roberto Barra-Chicote

In the waveform reconstruction task, the proposed model closes the naturalness and signal quality gap from the original PW to recordings by $10\%$, and from other state-of-the-art neural vocoding systems by more than $60\%$.

SynthASR: Unlocking Synthetic Data for Speech Recognition

no code implementations14 Jun 2021 Amin Fazel, Wei Yang, YuLan Liu, Roberto Barra-Chicote, Yixiong Meng, Roland Maas, Jasha Droppo

Our observations show that SynthASR holds great promise in training the state-of-the-art large-scale E2E ASR models for new applications while reducing the costs and dependency on production data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows

no code implementations10 Jun 2021 Iván Vallés-Pérez, Julian Roth, Grzegorz Beringer, Roberto Barra-Chicote, Jasha Droppo

This paper proposes a new neural text-to-speech model that approaches the disentanglement problem by conditioning a Tacotron2-like architecture on flow-normalized speaker embeddings, and by substituting the reference encoder with a new learned latent distribution responsible for modeling the intra-sentence variability due to the prosody.

Disentanglement Sentence

Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention

no code implementations29 Dec 2020 Daniel Korzekwa, Roberto Barra-Chicote, Szymon Zaporowski, Grzegorz Beringer, Jaime Lorenzo-Trueba, Alicja Serafinowicz, Jasha Droppo, Thomas Drugman, Bozena Kostek

This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS).

Data Augmentation

Parallel WaveNet conditioned on VAE latent vectors

no code implementations17 Dec 2020 Jonas Rohnke, Tom Merritt, Jaime Lorenzo-Trueba, Adam Gabrys, Vatsal Aggarwal, Alexis Moinet, Roberto Barra-Chicote

In this paper we investigate the use of a sentence-level conditioning vector to improve the signal quality of a Parallel WaveNet neural vocoder.

Sentence Speech Synthesis +1

BOFFIN TTS: Few-Shot Speaker Adaptation by Bayesian Optimization

no code implementations4 Feb 2020 Henry B. Moss, Vatsal Aggarwal, Nishant Prateek, Javier González, Roberto Barra-Chicote

We present BOFFIN TTS (Bayesian Optimization For FIne-tuning Neural Text To Speech), a novel approach for few-shot speaker adaptation.

Bayesian Optimization

Towards achieving robust universal neural vocoding

1 code implementation4 Jul 2019 Jaime Lorenzo-Trueba, Thomas Drugman, Javier Latorre, Thomas Merritt, Bartosz Putrycz, Roberto Barra-Chicote, Alexis Moinet, Vatsal Aggarwal

This vocoder is shown to be capable of generating speech of consistently good quality (98% relative mean MUSHRA when compared to natural speech) regardless of whether the input spectrogram comes from a speaker or style seen during training or from an out-of-domain scenario when the recording conditions are studio-quality.

Robust universal neural vocoding

8 code implementations15 Nov 2018 Jaime Lorenzo-Trueba, Thomas Drugman, Javier Latorre, Thomas Merritt, Bartosz Putrycz, Roberto Barra-Chicote

This paper introduces a robust universal neural vocoder trained with 74 speakers (comprised of both genders) coming from 17 languages.

Cannot find the paper you are looking for? You can Submit a new open access paper.