Search Results for author: Roberto Barra-Chicote

Found 24 papers, 3 papers with code

Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations

no code implementations • 5 Feb 2024 • Álvaro Martín-Cortinas, Daniel Sáez-Trigueros, Iván Vallés-Pérez, Biel Tura-Vecino, Piotr Biliński, Mateusz Lajszczak, Grzegorz Beringer, Roberto Barra-Chicote, Jaime Lorenzo-Trueba

Using speaker-disentangled codes to train LLMs for text-to-speech (TTS) allows the LLM to generate the content and the style of the speech only from the text, similarly to humans, while the speaker identity is provided by the decoder of the VC model.

In-Context Learning Voice Conversion

Paper
Add Code

Creating New Voices using Normalizing Flows

no code implementations • 22 Dec 2023 • Piotr Bilinski, Thomas Merritt, Abdelhamid Ezzerg, Kamil Pokora, Sebastian Cygert, Kayoko Yanagisawa, Roberto Barra-Chicote, Daniel Korzekwa

As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities.

Speech Synthesis Voice Conversion

Paper
Add Code

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

no code implementations • 31 Jul 2023 • Guangyan Zhang, Thomas Merritt, Manuel Sam Ribeiro, Biel Tura-Vecino, Kayoko Yanagisawa, Kamil Pokora, Abdelhamid Ezzerg, Sebastian Cygert, Ammar Abbas, Piotr Bilinski, Roberto Barra-Chicote, Daniel Korzekwa, Jaime Lorenzo-Trueba

Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong assumptions about the distributions of the target data space.

Acoustic Modelling Speech Synthesis +1

Paper
Add Code

SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces

no code implementations • 23 Jul 2023 • Ivan Vallés-Pérez, Grzegorz Beringer, Piotr Bilinski, Gary Cook, Roberto Barra-Chicote

We train a CLIP-based model with the aim to learn shared representations of phonetic and acoustic spaces.

Paper
Add Code

Remap, warp and attend: Non-parallel many-to-many accent conversion with Normalizing Flows

no code implementations • 10 Nov 2022 • Abdelhamid Ezzerg, Thomas Merritt, Kayoko Yanagisawa, Piotr Bilinski, Magdalena Proszewska, Kamil Pokora, Renard Korzeniowski, Roberto Barra-Chicote, Daniel Korzekwa

Regional accents of the same language affect not only how words are pronounced (i. e., phonetic content), but also impact prosodic aspects of speech such as speaking rate and intonation.

Paper
Add Code

Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech

no code implementations • 4 Nov 2022 • Xin Zhang, Iván Vallés-Pérez, Andreas Stolcke, Chengzhu Yu, Jasha Droppo, Olabanji Shonibare, Roberto Barra-Chicote, Venkatesh Ravichandran

By fine-tuning an ASR model on synthetic stuttered speech we are able to reduce word error by 5. 7% relative on stuttered utterances, with only minor (<0. 2% relative) degradation for fluent utterances.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion

no code implementations • 4 Jul 2022 • Magdalena Proszewska, Grzegorz Beringer, Daniel Sáez-Trigueros, Thomas Merritt, Abdelhamid Ezzerg, Roberto Barra-Chicote

We evaluate our models in terms of intelligibility, speaker similarity and naturalness for intra- and cross-lingual conversion in seen and unseen languages.

Voice Conversion

Paper
Add Code

Prosodic Alignment for off-screen automatic dubbing

no code implementations • 6 Apr 2022 • Yogesh Virkar, Marcello Federico, Robert Enyedi, Roberto Barra-Chicote

The goal of automatic dubbing is to perform speech-to-speech translation while achieving audiovisual coherence.

Speech-to-Speech Translation Translation

Paper
Add Code

Text-free non-parallel many-to-many voice conversion using normalising flows

no code implementations • 15 Mar 2022 • Thomas Merritt, Abdelhamid Ezzerg, Piotr Biliński, Magdalena Proszewska, Kamil Pokora, Roberto Barra-Chicote, Daniel Korzekwa

We investigate normalising flows for VC in both text-conditioned and text-free scenarios.

Normalising Flows Speech Synthesis +2

Paper
Add Code

Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

no code implementations • 16 Feb 2022 • Adam Gabryś, Goeric Huybrechts, Manuel Sam Ribeiro, Chung-Ming Chien, Julian Roth, Giulia Comini, Roberto Barra-Chicote, Bartek Perz, Jaime Lorenzo-Trueba

It uses voice conversion (VC) as a post-processing module appended to a pre-existing high-quality TTS system and marks a conceptual shift in the existing TTS paradigm, framing the few-shot TTS problem as a VC task.

Speech Synthesis Voice Conversion

Paper
Add Code

Machine Translation Verbosity Control for Automatic Dubbing

no code implementations • 8 Oct 2021 • Surafel M. Lakew, Marcello Federico, Yue Wang, Cuong Hoang, Yogesh Virkar, Roberto Barra-Chicote, Robert Enyedi

Automatic dubbing aims at seamlessly replacing the speech in a video document with synthetic speech in a different language.

Machine Translation Translation

Paper
Add Code

Improving the expressiveness of neural vocoding with non-affine Normalizing Flows

no code implementations • 16 Jun 2021 • Adam Gabryś, Yunlong Jiao, Viacheslav Klimkov, Daniel Korzekwa, Roberto Barra-Chicote

In the waveform reconstruction task, the proposed model closes the naturalness and signal quality gap from the original PW to recordings by $10\%$, and from other state-of-the-art neural vocoding systems by more than $60\%$.

Paper
Add Code

SynthASR: Unlocking Synthetic Data for Speech Recognition

no code implementations • 14 Jun 2021 • Amin Fazel, Wei Yang, YuLan Liu, Roberto Barra-Chicote, Yixiong Meng, Roland Maas, Jasha Droppo

Our observations show that SynthASR holds great promise in training the state-of-the-art large-scale E2E ASR models for new applications while reducing the costs and dependency on production data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows

no code implementations • 10 Jun 2021 • Iván Vallés-Pérez, Julian Roth, Grzegorz Beringer, Roberto Barra-Chicote, Jasha Droppo

This paper proposes a new neural text-to-speech model that approaches the disentanglement problem by conditioning a Tacotron2-like architecture on flow-normalized speaker embeddings, and by substituting the reference encoder with a new learned latent distribution responsible for modeling the intra-sentence variability due to the prosody.

Disentanglement Sentence

Paper
Add Code

Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention

no code implementations • 29 Dec 2020 • Daniel Korzekwa, Roberto Barra-Chicote, Szymon Zaporowski, Grzegorz Beringer, Jaime Lorenzo-Trueba, Alicja Serafinowicz, Jasha Droppo, Thomas Drugman, Bozena Kostek

This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS).

Data Augmentation

Paper
Add Code

Parallel WaveNet conditioned on VAE latent vectors

no code implementations • 17 Dec 2020 • Jonas Rohnke, Tom Merritt, Jaime Lorenzo-Trueba, Adam Gabrys, Vatsal Aggarwal, Alexis Moinet, Roberto Barra-Chicote

In this paper we investigate the use of a sentence-level conditioning vector to improve the signal quality of a Parallel WaveNet neural vocoder.

Sentence Speech Synthesis +1

Paper
Add Code

BOFFIN TTS: Few-Shot Speaker Adaptation by Bayesian Optimization

no code implementations • 4 Feb 2020 • Henry B. Moss, Vatsal Aggarwal, Nishant Prateek, Javier González, Roberto Barra-Chicote

We present BOFFIN TTS (Bayesian Optimization For FIne-tuning Neural Text To Speech), a novel approach for few-shot speaker adaptation.

Bayesian Optimization

Paper
Add Code

From Speech-to-Speech Translation to Automatic Dubbing

no code implementations • WS 2020 • Marcello Federico, Robert Enyedi, Roberto Barra-Chicote, Ritwik Giri, Umut Isik, Arvindh Krishnaswamy, Hassan Sawaf

We present enhancements to a speech-to-speech translation pipeline in order to perform automatic dubbing.

Machine Translation Speech-to-Speech Translation +1

Paper
Add Code

Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech

no code implementations • 28 Nov 2019 • Vatsal Aggarwal, Marius Cotescu, Nishant Prateek, Jaime Lorenzo-Trueba, Roberto Barra-Chicote

We propose a Text-to-Speech method to create an unseen expressive style using one utterance of expressive speech of around one second.

Disentanglement Expressive Speech Synthesis +1

Paper
Add Code

Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

no code implementations • 10 Jul 2019 • Daniel Korzekwa, Roberto Barra-Chicote, Bozena Kostek, Thomas Drugman, Mateusz Lajszczak

This paper proposed a novel approach for the detection and reconstruction of dysarthric speech.

Paper
Add Code

Towards achieving robust universal neural vocoding

1 code implementation • 4 Jul 2019 • Jaime Lorenzo-Trueba, Thomas Drugman, Javier Latorre, Thomas Merritt, Bartosz Putrycz, Roberto Barra-Chicote, Alexis Moinet, Vatsal Aggarwal

This vocoder is shown to be capable of generating speech of consistently good quality (98% relative mean MUSHRA when compared to natural speech) regardless of whether the input spectrogram comes from a speaker or style seen during training or from an out-of-domain scenario when the recording conditions are studio-quality.

234

Paper
Code

In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data

1 code implementation • NAACL 2019 • Nishant Prateek, Mateusz Łajszczak, Roberto Barra-Chicote, Thomas Drugman, Jaime Lorenzo-Trueba, Thomas Merritt, Srikanth Ronanki, Trevor Wood

Neural text-to-speech synthesis (NTTS) models have shown significant progress in generating high-quality speech, however they require a large quantity of training data.

Speech Synthesis Text-To-Speech Synthesis +1

Paper
Code

Robust universal neural vocoding

8 code implementations • 15 Nov 2018 • Jaime Lorenzo-Trueba, Thomas Drugman, Javier Latorre, Thomas Merritt, Bartosz Putrycz, Roberto Barra-Chicote

This paper introduces a robust universal neural vocoder trained with 74 speakers (comprised of both genders) coming from 17 languages.

308

Paper
Code

Continuous Expressive Speaking Styles Synthesis based on CVSM and MR-HMM

no code implementations • COLING 2016 • Jaime Lorenzo-Trueba, Roberto Barra-Chicote, Ascension Gallardo-Antolin, Junichi Yamagishi, Juan M. Montero

This paper introduces a continuous system capable of automatically producing the most adequate speaking style to synthesize a desired target text.

Expressive Speech Synthesis Speech Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.