Search Results for author: Milos Cernak

Found 24 papers, 6 papers with code

On real-time multi-stage speech enhancement systems

no code implementations19 Dec 2023 Lingjun Meng, Jozef Coldenhoff, Paul Kendrick, Tijana Stojkovic, Andrew Harper, Kiril Ratmanski, Milos Cernak

We first provide a consolidated view of the roles of gain power factor, post-filter, and training labels for the Mel-scale masking model.

Speech Enhancement

Cluster-based pruning techniques for audio data

1 code implementation21 Sep 2023 Boris Bergsma, Marta Brzezinska, Oleg V. Yazyev, Milos Cernak

In this work, we introduce, for the first time in the context of the audio domain, the k-means clustering as a method for efficient data pruning.

Clustering Keyword Spotting

Multi-Channel MOSRA: Mean Opinion Score and Room Acoustics Estimation Using Simulated Data and a Teacher Model

no code implementations21 Sep 2023 Jozef Coldenhoff, Andrew Harper, Paul Kendrick, Tijana Stojkovic, Milos Cernak

Previous methods for predicting room acoustic parameters and speech quality metrics have focused on the single-channel case, where room acoustics and Mean Opinion Score (MOS) are predicted for a single recording device.

Descriptive

In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms

no code implementations5 Sep 2023 Philipp Schilk, Niccolò Polvani, Andrea Ronco, Milos Cernak, Michele Magno

Such microphones can record the wearer's speech with much greater isolation, enabling personalized voice activity detection and further audio enhancement applications.

Action Detection Activity Detection

Speaker Embeddings as Individuality Proxy for Voice Stress Detection

no code implementations9 Jun 2023 Zihan Wu, Neil Scheidwasser-Clow, Karl El Hajal, Milos Cernak

However, the benchmark only evaluates performance separately on each dataset, but does not evaluate performance across the different types of stress and different languages.

ALO-VC: Any-to-any Low-latency One-shot Voice Conversion

no code implementations1 Jun 2023 Bohan Wang, Damien Ronssin, Milos Cernak

This paper presents ALO-VC, a non-parallel low-latency one-shot phonetic posteriorgrams (PPGs) based voice conversion method.

Voice Conversion

Efficient Speech Quality Assessment using Self-supervised Framewise Embeddings

no code implementations12 Nov 2022 Karl El Hajal, Zihan Wu, Neil Scheidwasser-Clow, Gasser Elbanna, Milos Cernak

Automatic speech quality assessment is essential for audio researchers, developers, speech and language pathologists, and system quality engineers.

BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping

1 code implementation24 Jun 2022 Gasser Elbanna, Neil Scheidwasser-Clow, Mikolaj Kegler, Pierre Beckmann, Karl El Hajal, Milos Cernak

Our results indicate that the hybrid model with a convolutional transformer as the encoder yields superior performance in most HEAR challenge tasks.

Scene Classification Self-Supervised Learning

MOSRA: Joint Mean Opinion Score and Room Acoustics Speech Quality Assessment

no code implementations4 Apr 2022 Karl El Hajal, Milos Cernak, Pablo Mainar

The acoustic environment can degrade speech quality during communication (e. g., video call, remote presentation, outside voice recording), and its impact is often unknown.

AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion

no code implementations12 Nov 2021 Damien Ronssin, Milos Cernak

This paper presents AC-VC (Almost Causal Voice Conversion), a phonetic posteriorgrams based voice conversion system that can perform any-to-many voice conversion while having only 57. 5 ms future look-ahead.

Voice Conversion

SERAB: A multi-lingual benchmark for speech emotion recognition

2 code implementations7 Oct 2021 Neil Scheidwasser-Clow, Mikolaj Kegler, Pierre Beckmann, Milos Cernak

To facilitate the process, here, we present the Speech Emotion Recognition Adaptation Benchmark (SERAB), a framework for evaluating the performance and generalization capacity of different approaches for utterance-level SER.

Benchmarking Speech Emotion Recognition

PEAF: Learnable Power Efficient Analog Acoustic Features for Audio Recognition

no code implementations7 Oct 2021 Boris Bergsma, Minhao Yang, Milos Cernak

At the end of Moore's law, new computing paradigms are required to prolong the battery life of wearable and IoT smart audio devices.

Action Detection Activity Detection +2

A Universal Deep Room Acoustics Estimator

no code implementations29 Sep 2021 Paula Sánchez López, Paul Callens, Milos Cernak

Speech audio quality is subject to degradation caused by an acoustic environment and isotropic ambient and point noises.

Room Impulse Response (RIR)

Joint Blind Room Acoustic Characterization From Speech And Music Signals Using Convolutional Recurrent Neural Networks

no code implementations21 Oct 2020 Paul Callens, Milos Cernak

Acoustic environment characterization opens doors for sound reproduction innovations, smart EQing, speech enhancement, hearing aids, and forensics.

Room Impulse Response (RIR) Speech Enhancement

FastVC: Fast Voice Conversion with non-parallel data

no code implementations8 Oct 2020 Oriol Barbany Mayor, Milos Cernak

Despite the simple structure of the proposed model, it outperforms the VC Challenge 2020 baselines on the cross-lingual task in terms of naturalness.

Voice Conversion

Spiking neural networks trained with backpropagation for low power neuromorphic implementation of voice activity detection

no code implementations22 Oct 2019 Flavio Martinelli, Giorgia Dellaferrera, Pablo Mainar, Milos Cernak

We describe an SNN training procedure that achieves low spiking activity and pruning algorithms to remove 85% of the network connections with no performance loss.

Action Detection Activity Detection

Deep speech inpainting of time-frequency masks

2 code implementations20 Oct 2019 Mikolaj Kegler, Pierre Beckmann, Milos Cernak

To address these limitations, here we propose an end-to-end framework for speech inpainting, the context-based retrieval of missing or severely distorted parts of time-frequency representation of speech.

Retrieval

Speech vocoding for laboratory phonology

no code implementations22 Jan 2016 Milos Cernak, Stefan Benus, Alexandros Lazaridis

Using phonological speech vocoding, we propose a platform for exploring relations between phonology and speech processing, and in broader terms, for exploring relations between the abstract and physical structures of a speech signal.

Speech Synthesis

On Structured Sparsity of Phonological Posteriors for Linguistic Parsing

no code implementations21 Jan 2016 Milos Cernak, Afsaneh Asaei, Hervé Bourlard

Building on findings from converging linguistic evidence on the gestural model of Articulatory Phonology as well as the neural basis of speech perception, we hypothesize that phonological posteriors convey properties of linguistic classes at multiple time scales, and this information is embedded in their support (index) of active coefficients.

Cannot find the paper you are looking for? You can Submit a new open access paper.