Search Results for author: Daniel Korzekwa

Found 16 papers, 0 papers with code

AE-Flow: AutoEncoder Normalizing Flow

no code implementations • 27 Dec 2023 • Jakub Mosiński, Piotr Biliński, Thomas Merritt, Abdelhamid Ezzerg, Daniel Korzekwa

The results show that the proposed training paradigm systematically improves speaker similarity and naturalness when compared to regular training methods of normalizing flows.

Voice Conversion

Paper
Add Code

Creating New Voices using Normalizing Flows

no code implementations • 22 Dec 2023 • Piotr Bilinski, Thomas Merritt, Abdelhamid Ezzerg, Kamil Pokora, Sebastian Cygert, Kayoko Yanagisawa, Roberto Barra-Chicote, Daniel Korzekwa

As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities.

Speech Synthesis Voice Conversion

Paper
Add Code

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

no code implementations • 31 Jul 2023 • Guangyan Zhang, Thomas Merritt, Manuel Sam Ribeiro, Biel Tura-Vecino, Kayoko Yanagisawa, Kamil Pokora, Abdelhamid Ezzerg, Sebastian Cygert, Ammar Abbas, Piotr Bilinski, Roberto Barra-Chicote, Daniel Korzekwa, Jaime Lorenzo-Trueba

Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong assumptions about the distributions of the target data space.

Acoustic Modelling Speech Synthesis +1

Paper
Add Code

On granularity of prosodic representations in expressive text-to-speech

no code implementations • 26 Jan 2023 • Mikolaj Babianski, Kamil Pokora, Raahil Shah, Rafal Sienkiewicz, Daniel Korzekwa, Viacheslav Klimkov

In expressive speech synthesis it is widely adopted to use latent prosody representations to deal with variability of the data during training.

Expressive Speech Synthesis

Paper
Add Code

Remap, warp and attend: Non-parallel many-to-many accent conversion with Normalizing Flows

no code implementations • 10 Nov 2022 • Abdelhamid Ezzerg, Thomas Merritt, Kayoko Yanagisawa, Piotr Bilinski, Magdalena Proszewska, Kamil Pokora, Renard Korzeniowski, Roberto Barra-Chicote, Daniel Korzekwa

Regional accents of the same language affect not only how words are pronounced (i. e., phonetic content), but also impact prosodic aspects of speech such as speaking rate and intonation.

Paper
Add Code

Automated detection of pronunciation errors in non-native English speech employing deep learning

no code implementations • 13 Sep 2022 • Daniel Korzekwa

One of the problems with existing CAPT methods is the low availability of annotated mispronounced speech needed for reliable training of pronunciation error detection models.

Speech Synthesis

Paper
Add Code

Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need

no code implementations • 2 Jul 2022 • Daniel Korzekwa, Jaime Lorenzo-Trueba, Thomas Drugman, Bozena Kostek

We show that these techniques not only improve the accuracy of three machine learning models for detecting pronunciation errors but also help establish a new state-of-the-art in the field.

Speech Synthesis

Paper
Add Code

Text-free non-parallel many-to-many voice conversion using normalising flows

no code implementations • 15 Mar 2022 • Thomas Merritt, Abdelhamid Ezzerg, Piotr Biliński, Magdalena Proszewska, Kamil Pokora, Roberto Barra-Chicote, Daniel Korzekwa

We investigate normalising flows for VC in both text-conditioned and text-free scenarios.

Normalising Flows Speech Synthesis +2

Paper
Add Code

Enhancing audio quality for expressive Neural Text-to-Speech

no code implementations • 13 Aug 2021 • Abdelhamid Ezzerg, Adam Gabrys, Bartosz Putrycz, Daniel Korzekwa, Daniel Saez-Trigueros, David McHardy, Kamil Pokora, Jakub Lachowicz, Jaime Lorenzo-Trueba, Viacheslav Klimkov

Artificial speech synthesis has made a great leap in terms of naturalness as recent Text-to-Speech (TTS) systems are capable of producing speech with similar quality to human recordings.

Acoustic Modelling Speech Synthesis

Paper
Add Code

Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech

no code implementations • 24 Jun 2021 • Raahil Shah, Kamil Pokora, Abdelhamid Ezzerg, Viacheslav Klimkov, Goeric Huybrechts, Bartosz Putrycz, Daniel Korzekwa, Thomas Merritt

In this paper, we present a method for building highly expressive TTS voices with as little as 15 minutes of speech data from the target speaker.

Generative Adversarial Network

Paper
Add Code

Improving the expressiveness of neural vocoding with non-affine Normalizing Flows

no code implementations • 16 Jun 2021 • Adam Gabryś, Yunlong Jiao, Viacheslav Klimkov, Daniel Korzekwa, Roberto Barra-Chicote

In the waveform reconstruction task, the proposed model closes the naturalness and signal quality gap from the original PW to recordings by $10\%$, and from other state-of-the-art neural vocoding systems by more than $60\%$.

Paper
Add Code

Weakly-supervised word-level pronunciation error detection in non-native English speech

no code implementations • 7 Jun 2021 • Daniel Korzekwa, Jaime Lorenzo-Trueba, Thomas Drugman, Shira Calamaro, Bozena Kostek

To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words.

Paper
Add Code

Universal Neural Vocoding with Parallel WaveNet

no code implementations • 1 Feb 2021 • Yunlong Jiao, Adam Gabrys, Georgi Tinchev, Bartosz Putrycz, Daniel Korzekwa, Viacheslav Klimkov

We present a universal neural vocoder based on Parallel WaveNet, with an additional conditioning network called Audio Encoder.

Speech Synthesis

Paper
Add Code

Mispronunciation Detection in Non-native (L2) English with Uncertainty Modeling

no code implementations • 16 Jan 2021 • Daniel Korzekwa, Jaime Lorenzo-Trueba, Szymon Zaporowski, Shira Calamaro, Thomas Drugman, Bozena Kostek

A common approach to the automatic detection of mispronunciation in language learning is to recognize the phonemes produced by a student and compare it to the expected pronunciation of a native speaker.

Automatic Phoneme Recognition Sentence +1

Paper
Add Code

Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention

no code implementations • 29 Dec 2020 • Daniel Korzekwa, Roberto Barra-Chicote, Szymon Zaporowski, Grzegorz Beringer, Jaime Lorenzo-Trueba, Alicja Serafinowicz, Jasha Droppo, Thomas Drugman, Bozena Kostek

This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS).

Data Augmentation

Paper
Add Code

Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

no code implementations • 10 Jul 2019 • Daniel Korzekwa, Roberto Barra-Chicote, Bozena Kostek, Thomas Drugman, Mateusz Lajszczak

This paper proposed a novel approach for the detection and reconstruction of dysarthric speech.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.