no code implementations • 19 Dec 2023 • Lingjun Meng, Jozef Coldenhoff, Paul Kendrick, Tijana Stojkovic, Andrew Harper, Kiril Ratmanski, Milos Cernak
We first provide a consolidated view of the roles of gain power factor, post-filter, and training labels for the Mel-scale masking model.
1 code implementation • 21 Sep 2023 • Boris Bergsma, Marta Brzezinska, Oleg V. Yazyev, Milos Cernak
In this work, we introduce, for the first time in the context of the audio domain, the k-means clustering as a method for efficient data pruning.
no code implementations • 21 Sep 2023 • Jozef Coldenhoff, Andrew Harper, Paul Kendrick, Tijana Stojkovic, Milos Cernak
Previous methods for predicting room acoustic parameters and speech quality metrics have focused on the single-channel case, where room acoustics and Mean Opinion Score (MOS) are predicted for a single recording device.
no code implementations • 5 Sep 2023 • Philipp Schilk, Niccolò Polvani, Andrea Ronco, Milos Cernak, Michele Magno
Such microphones can record the wearer's speech with much greater isolation, enabling personalized voice activity detection and further audio enhancement applications.
no code implementations • 9 Jun 2023 • Zihan Wu, Neil Scheidwasser-Clow, Karl El Hajal, Milos Cernak
However, the benchmark only evaluates performance separately on each dataset, but does not evaluate performance across the different types of stress and different languages.
no code implementations • 1 Jun 2023 • Bohan Wang, Damien Ronssin, Milos Cernak
This paper presents ALO-VC, a non-parallel low-latency one-shot phonetic posteriorgrams (PPGs) based voice conversion method.
no code implementations • 6 Dec 2022 • Niccolo' Polvani, Damien Ronssin, Milos Cernak
Voice Activity Detection (VAD) is a fundamental module in many audio applications.
no code implementations • 12 Nov 2022 • Karl El Hajal, Zihan Wu, Neil Scheidwasser-Clow, Gasser Elbanna, Milos Cernak
Automatic speech quality assessment is essential for audio researchers, developers, speech and language pathologists, and system quality engineers.
1 code implementation • 24 Jun 2022 • Gasser Elbanna, Neil Scheidwasser-Clow, Mikolaj Kegler, Pierre Beckmann, Karl El Hajal, Milos Cernak
Our results indicate that the hybrid model with a convolutional transformer as the encoder yields superior performance in most HEAR challenge tasks.
Ranked #1 on Self-Supervised Learning on CREMA-D
no code implementations • 4 Apr 2022 • Karl El Hajal, Milos Cernak, Pablo Mainar
The acoustic environment can degrade speech quality during communication (e. g., video call, remote presentation, outside voice recording), and its impact is often unknown.
1 code implementation • 30 Mar 2022 • Gasser Elbanna, Alice Biryukov, Neil Scheidwasser-Clow, Lara Orlandic, Pablo Mainar, Mikolaj Kegler, Pierre Beckmann, Milos Cernak
To that end, we introduce a set of five datasets for task load detection in speech.
no code implementations • 12 Nov 2021 • Damien Ronssin, Milos Cernak
This paper presents AC-VC (Almost Causal Voice Conversion), a phonetic posteriorgrams based voice conversion system that can perform any-to-many voice conversion while having only 57. 5 ms future look-ahead.
2 code implementations • 7 Oct 2021 • Neil Scheidwasser-Clow, Mikolaj Kegler, Pierre Beckmann, Milos Cernak
To facilitate the process, here, we present the Speech Emotion Recognition Adaptation Benchmark (SERAB), a framework for evaluating the performance and generalization capacity of different approaches for utterance-level SER.
no code implementations • 7 Oct 2021 • Boris Bergsma, Minhao Yang, Milos Cernak
At the end of Moore's law, new computing paradigms are required to prolong the battery life of wearable and IoT smart audio devices.
no code implementations • 29 Sep 2021 • Paula Sánchez López, Paul Callens, Milos Cernak
Speech audio quality is subject to degradation caused by an acoustic environment and isotropic ambient and point noises.
no code implementations • 21 Oct 2020 • Paul Callens, Milos Cernak
Acoustic environment characterization opens doors for sound reproduction innovations, smart EQing, speech enhancement, hearing aids, and forensics.
no code implementations • 19 Oct 2020 • Alexandru Mocanu, Benjamin Ricaud, Milos Cernak
Music source separation represents the task of extracting all the instruments from a given song.
no code implementations • 8 Oct 2020 • Oriol Barbany Mayor, Milos Cernak
Despite the simple structure of the proposed model, it outperforms the VC Challenge 2020 baselines on the cross-lingual task in terms of naturalness.
2 code implementations • 22 Oct 2019 • Pierre Beckmann, Mikolaj Kegler, Milos Cernak
Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +7
no code implementations • 22 Oct 2019 • Flavio Martinelli, Giorgia Dellaferrera, Pablo Mainar, Milos Cernak
We describe an SNN training procedure that achieves low spiking activity and pruning algorithms to remove 85% of the network connections with no performance loss.
2 code implementations • 20 Oct 2019 • Mikolaj Kegler, Pierre Beckmann, Milos Cernak
To address these limitations, here we propose an end-to-end framework for speech inpainting, the context-based retrieval of missing or severely distorted parts of time-frequency representation of speech.
no code implementations • 15 Apr 2016 • Milos Cernak, Alexandros Lazaridis, Afsaneh Asaei, Philip N. Garner
Segmental errors are further propagated to optional suprasegmental (such as syllable) information coding.
no code implementations • 22 Jan 2016 • Milos Cernak, Stefan Benus, Alexandros Lazaridis
Using phonological speech vocoding, we propose a platform for exploring relations between phonology and speech processing, and in broader terms, for exploring relations between the abstract and physical structures of a speech signal.
no code implementations • 21 Jan 2016 • Milos Cernak, Afsaneh Asaei, Hervé Bourlard
Building on findings from converging linguistic evidence on the gestural model of Articulatory Phonology as well as the neural basis of speech perception, we hypothesize that phonological posteriors convey properties of linguistic classes at multiple time scales, and this information is embedded in their support (index) of active coefficients.