Search Results for author: Milos Cernak

Found 24 papers, 6 papers with code

On real-time multi-stage speech enhancement systems

no code implementations • 19 Dec 2023 • Lingjun Meng, Jozef Coldenhoff, Paul Kendrick, Tijana Stojkovic, Andrew Harper, Kiril Ratmanski, Milos Cernak

We first provide a consolidated view of the roles of gain power factor, post-filter, and training labels for the Mel-scale masking model.

Speech Enhancement

Paper
Add Code

Cluster-based pruning techniques for audio data

1 code implementation • 21 Sep 2023 • Boris Bergsma, Marta Brzezinska, Oleg V. Yazyev, Milos Cernak

In this work, we introduce, for the first time in the context of the audio domain, the k-means clustering as a method for efficient data pruning.

Clustering Keyword Spotting

Paper
Code

Multi-Channel MOSRA: Mean Opinion Score and Room Acoustics Estimation Using Simulated Data and a Teacher Model

no code implementations • 21 Sep 2023 • Jozef Coldenhoff, Andrew Harper, Paul Kendrick, Tijana Stojkovic, Milos Cernak

Previous methods for predicting room acoustic parameters and speech quality metrics have focused on the single-channel case, where room acoustics and Mean Opinion Score (MOS) are predicted for a single recording device.

Descriptive

Paper
Add Code

In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms

no code implementations • 5 Sep 2023 • Philipp Schilk, Niccolò Polvani, Andrea Ronco, Milos Cernak, Michele Magno

Such microphones can record the wearer's speech with much greater isolation, enabling personalized voice activity detection and further audio enhancement applications.

Action Detection Activity Detection

Paper
Add Code

Speaker Embeddings as Individuality Proxy for Voice Stress Detection

no code implementations • 9 Jun 2023 • Zihan Wu, Neil Scheidwasser-Clow, Karl El Hajal, Milos Cernak

However, the benchmark only evaluates performance separately on each dataset, but does not evaluate performance across the different types of stress and different languages.

Paper
Add Code

ALO-VC: Any-to-any Low-latency One-shot Voice Conversion

no code implementations • 1 Jun 2023 • Bohan Wang, Damien Ronssin, Milos Cernak

This paper presents ALO-VC, a non-parallel low-latency one-shot phonetic posteriorgrams (PPGs) based voice conversion method.

Voice Conversion

Paper
Add Code

BC-VAD: A Robust Bone Conduction Voice Activity Detection

no code implementations • 6 Dec 2022 • Niccolo' Polvani, Damien Ronssin, Milos Cernak

Voice Activity Detection (VAD) is a fundamental module in many audio applications.

Action Detection Activity Detection

Paper
Add Code

Efficient Speech Quality Assessment using Self-supervised Framewise Embeddings

no code implementations • 12 Nov 2022 • Karl El Hajal, Zihan Wu, Neil Scheidwasser-Clow, Gasser Elbanna, Milos Cernak

Automatic speech quality assessment is essential for audio researchers, developers, speech and language pathologists, and system quality engineers.

Paper
Add Code

BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping

1 code implementation • 24 Jun 2022 • Gasser Elbanna, Neil Scheidwasser-Clow, Mikolaj Kegler, Pierre Beckmann, Karl El Hajal, Milos Cernak

Our results indicate that the hybrid model with a convolutional transformer as the encoder yields superior performance in most HEAR challenge tasks.

Ranked #1 on Self-Supervised Learning on CREMA-D

Scene Classification Self-Supervised Learning

Paper
Code

MOSRA: Joint Mean Opinion Score and Room Acoustics Speech Quality Assessment

no code implementations • 4 Apr 2022 • Karl El Hajal, Milos Cernak, Pablo Mainar

The acoustic environment can degrade speech quality during communication (e. g., video call, remote presentation, outside voice recording), and its impact is often unknown.

Paper
Add Code

Hybrid Handcrafted and Learnable Audio Representation for Analysis of Speech Under Cognitive and Physical Load

1 code implementation • 30 Mar 2022 • Gasser Elbanna, Alice Biryukov, Neil Scheidwasser-Clow, Lara Orlandic, Pablo Mainar, Mikolaj Kegler, Pierre Beckmann, Milos Cernak

To that end, we introduce a set of five datasets for task load detection in speech.

Representation Learning

Paper
Code

AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion

no code implementations • 12 Nov 2021 • Damien Ronssin, Milos Cernak

This paper presents AC-VC (Almost Causal Voice Conversion), a phonetic posteriorgrams based voice conversion system that can perform any-to-many voice conversion while having only 57. 5 ms future look-ahead.

Voice Conversion

Paper
Add Code

SERAB: A multi-lingual benchmark for speech emotion recognition

2 code implementations • 7 Oct 2021 • Neil Scheidwasser-Clow, Mikolaj Kegler, Pierre Beckmann, Milos Cernak

To facilitate the process, here, we present the Speech Emotion Recognition Adaptation Benchmark (SERAB), a framework for evaluating the performance and generalization capacity of different approaches for utterance-level SER.

Benchmarking Speech Emotion Recognition

Paper
Code

PEAF: Learnable Power Efficient Analog Acoustic Features for Audio Recognition

no code implementations • 7 Oct 2021 • Boris Bergsma, Minhao Yang, Milos Cernak

At the end of Moore's law, new computing paradigms are required to prolong the battery life of wearable and IoT smart audio devices.

Action Detection Activity Detection +2

Paper
Add Code

A Universal Deep Room Acoustics Estimator

no code implementations • 29 Sep 2021 • Paula Sánchez López, Paul Callens, Milos Cernak

Speech audio quality is subject to degradation caused by an acoustic environment and isotropic ambient and point noises.

Room Impulse Response (RIR)

Paper
Add Code

Joint Blind Room Acoustic Characterization From Speech And Music Signals Using Convolutional Recurrent Neural Networks

no code implementations • 21 Oct 2020 • Paul Callens, Milos Cernak

Acoustic environment characterization opens doors for sound reproduction innovations, smart EQing, speech enhancement, hearing aids, and forensics.

Room Impulse Response (RIR) Speech Enhancement

Paper
Add Code

Fast accuracy estimation of deep learning based multi-class musical source separation

no code implementations • 19 Oct 2020 • Alexandru Mocanu, Benjamin Ricaud, Milos Cernak

Music source separation represents the task of extracting all the instruments from a given song.

Audio Source Separation Music Source Separation

Paper
Add Code

FastVC: Fast Voice Conversion with non-parallel data

no code implementations • 8 Oct 2020 • Oriol Barbany Mayor, Milos Cernak

Despite the simple structure of the proposed model, it outperforms the VC Challenge 2020 baselines on the cross-lingual task in terms of naturalness.

Voice Conversion

Paper
Add Code

Word-level Embeddings for Cross-Task Transfer Learning in Speech Processing

2 code implementations • 22 Oct 2019 • Pierre Beckmann, Mikolaj Kegler, Milos Cernak

Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

Paper
Code

Spiking neural networks trained with backpropagation for low power neuromorphic implementation of voice activity detection

no code implementations • 22 Oct 2019 • Flavio Martinelli, Giorgia Dellaferrera, Pablo Mainar, Milos Cernak

We describe an SNN training procedure that achieves low spiking activity and pruning algorithms to remove 85% of the network connections with no performance loss.

Action Detection Activity Detection

Paper
Add Code

Deep speech inpainting of time-frequency masks

2 code implementations • 20 Oct 2019 • Mikolaj Kegler, Pierre Beckmann, Milos Cernak

To address these limitations, here we propose an end-to-end framework for speech inpainting, the context-based retrieval of missing or severely distorted parts of time-frequency representation of speech.

Retrieval

Paper
Code

Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding

no code implementations • 15 Apr 2016 • Milos Cernak, Alexandros Lazaridis, Afsaneh Asaei, Philip N. Garner

Segmental errors are further propagated to optional suprasegmental (such as syllable) information coding.

speech-recognition Speech Recognition

Paper
Add Code

Speech vocoding for laboratory phonology

no code implementations • 22 Jan 2016 • Milos Cernak, Stefan Benus, Alexandros Lazaridis

Using phonological speech vocoding, we propose a platform for exploring relations between phonology and speech processing, and in broader terms, for exploring relations between the abstract and physical structures of a speech signal.

Speech Synthesis

Paper
Add Code

On Structured Sparsity of Phonological Posteriors for Linguistic Parsing

no code implementations • 21 Jan 2016 • Milos Cernak, Afsaneh Asaei, Hervé Bourlard

Building on findings from converging linguistic evidence on the gestural model of Articulatory Phonology as well as the neural basis of speech perception, we hypothesize that phonological posteriors convey properties of linguistic classes at multiple time scales, and this information is embedded in their support (index) of active coefficients.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.