Search Results for author: Timo Gerkmann

Found 51 papers, 20 papers with code

Diffusion Models for Audio Restoration

no code implementations • 15 Feb 2024 • Jean-Marie Lemercier, Julius Richter, Simon Welker, Eloi Moliner, Vesa Välimäki, Timo Gerkmann

Here, we aim to show that diffusion models can combine the best of both worlds and offer the opportunity to design audio restoration algorithms with a good degree of interpretability and a remarkable performance in terms of sound quality.

Speech Enhancement

Paper
Add Code

An Analysis of the Variance of Diffusion-based Speech Enhancement

no code implementations • 1 Feb 2024 • Bunlong Lay, Timo Gerkmann

The speech enhancement performance varies depending on the choice of the stochastic differential equation that controls the evolution of the mean and the variance along the diffusion processes when adding environmental and Gaussian noise.

Speech Enhancement

Paper
Add Code

Single and Few-step Diffusion for Generative Speech Enhancement

1 code implementation • 18 Sep 2023 • Bunlong Lay, Jean-Marie Lemercier, Julius Richter, Timo Gerkmann

While the performance of usual generative diffusion algorithms drops dramatically when lowering the number of function evaluations (NFEs) to obtain single-step diffusion, we show that our proposed method keeps a steady performance and therefore largely outperforms the diffusion baseline in this setting and also generalizes better than its predictive counterpart.

Denoising Speech Enhancement

Paper
Code

Distilling HuBERT with LSTMs via Decoupled Knowledge Distillation

no code implementations • 18 Sep 2023 • Danilo de Oliveira, Timo Gerkmann

Much research effort is being applied to the task of compressing the knowledge of self-supervised models, which are powerful, yet large and memory consuming.

Automatic Speech Recognition Knowledge Distillation +2

Paper
Add Code

Live Iterative Ptychography with projection-based algorithms

no code implementations • 14 Sep 2023 • Simon Welker, Tal Peer, Henry N. Chapman, Timo Gerkmann

In this work, we demonstrate that the ptychographic phase problem can be solved in a live fashion during scanning, while data is still being collected.

Retrieval

Paper
Add Code

EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data

no code implementations • 14 Sep 2023 • Navin Raj Prabhu, Bunlong Lay, Simon Welker, Nale Lehmann-Willenbrock, Timo Gerkmann

Subsequently, at inference, a target emotion embedding is employed to convert the emotion of the input utterance to the given target emotion.

Paper
Add Code

A Flexible Online Framework for Projection-Based STFT Phase Retrieval

no code implementations • 13 Sep 2023 • Tal Peer, Simon Welker, Johannes Kolhoff, Timo Gerkmann

Several recent contributions in the field of iterative STFT phase retrieval have demonstrated that the performance of the classical Griffin-Lim method can be considerably improved upon.

Retrieval

Paper
Add Code

Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model

1 code implementation • 22 Jun 2023 • Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann

We show that our stochastic regeneration model outperforms other neural-network-based wind noise reduction methods as well as purely predictive and generative models, on a dataset using simulated and real-recorded wind noise.

141

Paper
Code

Diffusion Posterior Sampling for Informed Single-Channel Dereverberation

1 code implementation • 21 Jun 2023 • Jean-Marie Lemercier, Simon Welker, Timo Gerkmann

We present in this paper an informed single-channel dereverberation method based on conditional generation with diffusion models.

Paper
Code

On the Behavior of Intrusive and Non-intrusive Speech Enhancement Metrics in Predictive and Generative Settings

no code implementations • 5 Jun 2023 • Danilo de Oliveira, Julius Richter, Jean-Marie Lemercier, Tal Peer, Timo Gerkmann

Since its inception, the field of deep speech enhancement has been dominated by predictive (discriminative) approaches, such as spectral mapping or masking.

Denoising Speech Enhancement

Paper
Add Code

In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised Representations and Neural Vocoder-based Resynthesis

no code implementations • 2 Jun 2023 • Navin Raj Prabhu, Nale Lehmann-Willenbrock, Timo Gerkmann

In this work, we specifically focus on in-the-wild emotion conversion where parallel data does not exist, and the problem of disentangling lexical, speaker, and emotion information arises.

Resynthesis

Paper
Add Code

Audio-Visual Speech Enhancement with Score-Based Generative Models

no code implementations • 2 Jun 2023 • Julius Richter, Simone Frintrop, Timo Gerkmann

This paper introduces an audio-visual speech enhancement system that leverages score-based generative models, also known as diffusion models, conditioned on visual information.

Automatic Speech Recognition Lipreading +3

Paper
Add Code

Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model

1 code implementation • 31 May 2023 • Héctor Martel, Julius Richter, Kai Li, Xiaolin Hu, Timo Gerkmann

We propose Audio-Visual Lightweight ITerative model (AVLIT), an effective and lightweight neural network that uses Progressive Learning (PL) to perform audio-visual speech separation in noisy environments.

Speech Separation

Paper
Code

Leveraging Semantic Information for Efficient Self-Supervised Emotion Recognition with Audio-Textual Distilled Models

no code implementations • 30 May 2023 • Danilo de Oliveira, Navin Raj Prabhu, Timo Gerkmann

In large part due to their implicit semantic modeling, self-supervised learning (SSL) methods have significantly increased the performance of valence recognition in speech emotion recognition (SER) systems.

Self-Supervised Learning Speech Emotion Recognition

Paper
Add Code

Integrating Uncertainty into Neural Network-based Speech Enhancement

1 code implementation • 15 May 2023 • Huajian Fang, Dennis Becker, Stefan Wermter, Timo Gerkmann

In this paper, we study the benefits of modeling uncertainty in clean speech estimation.

Speech Enhancement

Paper
Code

Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters

no code implementations • 24 Apr 2023 • Kristina Tesch, Timo Gerkmann

In a multi-channel separation task with multiple speakers, we aim to recover all individual speech signals from the mixture.

Speech Separation

Paper
Add Code

Partially Adaptive Multichannel Joint Reduction of Ego-noise and Environmental Noise

no code implementations • 27 Mar 2023 • Huajian Fang, Niklas Wittmer, Johannes Twiefel, Stefan Wermter, Timo Gerkmann

In this paper, we propose a multichannel partially adaptive scheme to jointly model ego-noise and environmental noise utilizing the VAE-NMF framework, where we take advantage of spatially and spectrally structured characteristics of ego-noise by pre-training the ego-noise model, while retaining the ability to adapt to unknown environmental noise.

Paper
Add Code

Speech Signal Improvement Using Causal Generative Diffusion Models

no code implementations • 15 Mar 2023 • Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Tal Peer, Timo Gerkmann

In this paper, we present a causal speech signal improvement system that is designed to handle different types of distortions.

Paper
Add Code

Extending DNN-based Multiplicative Masking to Deep Subband Filtering for Improved Dereverberation

no code implementations • 1 Mar 2023 • Jean-Marie Lemercier, Julian Tobergte, Timo Gerkmann

We demonstrate that the resulting deep subband filtering scheme outperforms multiplicative masking for dereverberation, while leaving the denoising performance virtually the same.

Denoising

Paper
Add Code

Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement

2 code implementations • 28 Feb 2023 • Bunlong Lay, Simon Welker, Julius Richter, Timo Gerkmann

Recently, score-based generative models have been successfully employed for the task of speech enhancement.

Speech Enhancement

385

Paper
Code

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation

2 code implementations • 22 Dec 2022 • Jean-Marie Lemercier, Julius Richter, Simon Welker, Timo Gerkmann

As diffusion models are generative approaches they may also produce vocalizing and breathing artifacts in adverse conditions.

Speech Dereverberation

385

Paper
Code

Uncertainty Estimation in Deep Speech Enhancement Using Complex Gaussian Mixture Models

no code implementations • 9 Dec 2022 • Huajian Fang, Timo Gerkmann

Single-channel deep speech enhancement approaches often estimate a single multiplicative mask to extract clean speech without a measure of its accuracy.

Speech Enhancement Uncertainty Quantification

Paper
Add Code

DriftRec: Adapting diffusion models to blind JPEG restoration

1 code implementation • 12 Nov 2022 • Simon Welker, Henry N. Chapman, Timo Gerkmann

In this work, we utilize the high-fidelity generation abilities of diffusion models to solve blind JPEG restoration at high compression levels.

JPEG Artifact Removal

Paper
Code

DiffPhase: Generative Diffusion-based STFT Phase Retrieval

no code implementations • 8 Nov 2022 • Tal Peer, Simon Welker, Timo Gerkmann

Diffusion probabilistic models have been recently used in a variety of tasks, including speech enhancement and synthesis.

Imputation Retrieval +1

Paper
Add Code

Analysing Diffusion-based Generative Approaches versus Discriminative Approaches for Speech Restoration

1 code implementation • 4 Nov 2022 • Jean-Marie Lemercier, Julius Richter, Simon Welker, Timo Gerkmann

In this paper, we systematically compare the performance of generative diffusion models and discriminative approaches on different speech restoration tasks.

Bandwidth Extension Speech Denoising +1

385

Paper
Code

Spatially Selective Deep Non-linear Filters for Speaker Extraction

no code implementations • 4 Nov 2022 • Kristina Tesch, Timo Gerkmann

In a scenario with multiple persons talking simultaneously, the spatial characteristics of the signals are the most distinct feature for extracting the target signal.

Speech Separation

Paper
Add Code

Speech Enhancement and Dereverberation with Diffusion-based Generative Models

1 code implementation • IEEE/ACM Transactions on Audio, Speech, and Language Processing 2023 • Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Timo Gerkmann

This matches our forward process which moves from clean speech to noisy speech by including a drift term.

Ranked #19 on Speech Enhancement on VoiceBank + DEMAND

Speech Dereverberation

385

Paper
Code

Label Uncertainty Modeling and Prediction for Speech Emotion Recognition using t-Distributions

1 code implementation • 25 Jul 2022 • Navin Raj Prabhu, Nale Lehmann-Willenbrock, Timo Gerkmann

To address this, these emotion annotations are typically collected by multiple annotators and averaged across annotators in order to obtain labels for arousal and valence.

Speech Emotion Recognition

Paper
Code

Insights Into Deep Non-linear Filters for Improved Multi-channel Speech Enhancement

1 code implementation • 27 Jun 2022 • Kristina Tesch, Timo Gerkmann

The key advantage of using multiple microphones for speech enhancement is that spatial filtering can be used to complement the tempo-spectral processing.

Speech Enhancement

Paper
Code

Efficient Transformer-based Speech Enhancement Using Long Frames and STFT Magnitudes

no code implementations • 23 Jun 2022 • Danilo de Oliveira, Tal Peer, Timo Gerkmann

The SepFormer architecture shows very good results in speech separation.

Speech Enhancement Speech Separation

Paper
Add Code

On the Role of Spatial, Spectral, and Temporal Processing for DNN-based Non-linear Multi-channel Speech Enhancement

1 code implementation • 22 Jun 2022 • Kristina Tesch, Nils-Hendrik Mohrmann, Timo Gerkmann

Employing deep neural networks (DNNs) to directly learn filters for multi-channel speech enhancement has potentially two key advantages over a traditional approach combining a linear spatial filter with an independent tempo-spectral post-filter: 1) non-linear spatial filtering allows to overcome potential restrictions originating from a linear processing model and 2) joint processing of spatial and tempo-spectral information allows to exploit interdependencies between different sources of information.

Speech Enhancement Speech Extraction

Paper
Code

Beyond Griffin-Lim: Improved Iterative Phase Retrieval for Speech

no code implementations • 11 May 2022 • Tal Peer, Simon Welker, Timo Gerkmann

Phase retrieval is a problem encountered not only in speech and audio processing, but in many other fields such as optics.

Retrieval

Paper
Add Code

Neural Network-augmented Kalman Filtering for Robust Online Speech Dereverberation in Noisy Reverberant Environments

no code implementations • 6 Apr 2022 • Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann

In this paper, a neural network-augmented algorithm for noise-robust online dereverberation with a Kalman filtering variant of the weighted prediction error (WPE) method is proposed.

Denoising Speech Dereverberation

Paper
Add Code

A neural network-supported two-stage algorithm for lightweight dereverberation on hearing devices

no code implementations • 6 Apr 2022 • Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann

By deriving new metrics analyzing the dereverberation performance in various time ranges, we confirm that directly optimizing for a criterion at the output of the multi-channel linear filtering stage results in a more efficient dereverberation as compared to placing the criterion at the output of the DNN to optimize the PSD estimation.

Paper
Add Code

Customizable End-to-end Optimization of Online Neural Network-supported Dereverberation for Hearing Devices

no code implementations • 6 Apr 2022 • Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann

This work focuses on online dereverberation for hearing devices using the weighted prediction error (WPE) algorithm.

Paper
Add Code

Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain

1 code implementation • 31 Mar 2022 • Simon Welker, Julius Richter, Timo Gerkmann

Score-based generative models (SGMs) have recently shown impressive results for difficult generative tasks such as the unconditional and conditional generation of natural images and audio signals.

Speech Enhancement

385

Paper
Code

Phase-Aware Deep Speech Enhancement: It's All About The Frame Length

no code implementations • 30 Mar 2022 • Tal Peer, Timo Gerkmann

Algorithmic latency in speech processing is dominated by the frame length used for Fourier analysis, which in turn limits the achievable performance of magnitude-centric approaches.

Speech Enhancement

Paper
Add Code

Integrating Statistical Uncertainty into Neural Network-Based Speech Enhancement

no code implementations • 4 Mar 2022 • Huajian Fang, Tal Peer, Stefan Wermter, Timo Gerkmann

Speech enhancement in the time-frequency domain is often performed by estimating a multiplicative mask to extract clean speech.

Speech Enhancement

Paper
Add Code

Deep Iterative Phase Retrieval for Ptychography

no code implementations • 17 Feb 2022 • Simon Welker, Tal Peer, Henry N. Chapman, Timo Gerkmann

One of the most prominent challenges in the field of diffractive imaging is the phase retrieval (PR) problem: In order to reconstruct an object from its diffraction pattern, the inverse Fourier transform must be computed.

Retrieval

Paper
Add Code

End-To-End Label Uncertainty Modeling for Speech-based Arousal Recognition Using Bayesian Neural Networks

1 code implementation • 7 Oct 2021 • Navin Raj Prabhu, Guillaume Carbajal, Nale Lehmann-Willenbrock, Timo Gerkmann

At training, the network learns a distribution of weights to capture the inherent uncertainty related to subjective arousal annotations.

Speech Emotion Recognition

Paper
Code

Disentanglement Learning for Variational Autoencoders Applied to Audio-Visual Speech Enhancement

1 code implementation • 19 May 2021 • Guillaume Carbajal, Julius Richter, Timo Gerkmann

In this work, we propose to use an adversarial training scheme for variational autoencoders to disentangle the label from the other latent variables.

Attribute Disentanglement +1

Paper
Code

Nonlinear Spatial Filtering in Multichannel Speech Enhancement

no code implementations • 22 Apr 2021 • Kristina Tesch, Timo Gerkmann

Rather, the MMSE optimal filter is a joint spatial and spectral nonlinear function.

Speech Enhancement

Paper
Add Code

Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder

no code implementations • 17 Feb 2021 • Huajian Fang, Guillaume Carbajal, Stefan Wermter, Timo Gerkmann

Recently, a generative variational autoencoder (VAE) has been proposed for speech enhancement to model speech statistics.

Speech Enhancement

Paper
Add Code

Guided Variational Autoencoder for Speech Enhancement With a Supervised Classifier

no code implementations • 12 Feb 2021 • Guillaume Carbajal, Julius Richter, Timo Gerkmann

In this paper, we propose to guide the variational autoencoder with a supervised classifier separately trained on noisy speech.

Speech Enhancement

Paper
Add Code

Reinforcement Learning with Time-dependent Goals for Robotic Musicians

no code implementations • 11 Nov 2020 • Thilo Fryen, Manfred Eppe, Phuong D. H. Nguyen, Timo Gerkmann, Stefan Wermter

Reinforcement learning is a promising method to accomplish robotic control tasks.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Efficient Joint Estimation of Tracer Distribution and Background Signals in Magnetic Particle Imaging using a Dictionary Approach

1 code implementation • 10 Jun 2020 • Tobias Knopp, Mirco Grosser, Matthias Graeser, Timo Gerkmann, Martin Möddel

Background signals are a primary source of artifacts in magnetic particle imaging and limit the sensitivity of the method since background signals are often not precisely known and vary over time.

Paper
Code

SNR-Based Features and Diverse Training Data for Robust DNN-Based Speech Enhancement

no code implementations • 7 Apr 2020 • Robert Rehr, Timo Gerkmann

In this paper, we address the generalization of deep neural network (DNN) based speech enhancement to unseen noise conditions for the case that training data is limited in size and diversity.

Speech Enhancement

Paper
Add Code

Robust Robotic Pouring using Audition and Haptics

1 code implementation • 29 Feb 2020 • Hongzhuo Liang, Chuangchuang Zhou, Shuang Li, Xiaojian Ma, Norman Hendrich, Timo Gerkmann, Fuchun Sun, Marcus Stoffel, Jianwei Zhang

Both network training results and robot experiments demonstrate that MP-Net is robust against noise and changes to the task and environment.

Paper
Code

A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet

1 code implementation • 25 Oct 2019 • David Ditter, Timo Gerkmann

In this work, we investigate if the learned encoder of the end-to-end convolutional time domain audio separation network (Conv-TasNet) is the key to its recent success, or if the encoder can just as well be replaced by a deterministic hand-crafted filterbank.

Low-latency processing Speech Separation

Paper
Code

Making Sense of Audio Vibration for Liquid Height Estimation in Robotic Pouring

1 code implementation • 2 Mar 2019 • Hongzhuo Liang, Shuang Li, Xiaojian Ma, Norman Hendrich, Timo Gerkmann, Jianwei Zhang

PouringNet is trained on our collected real-world pouring dataset with multimodal sensing data, which contains more than 3000 recordings of audio, force feedback, video and trajectory data of the human hand that performs the pouring task.

Robotics Sound Audio and Speech Processing

Paper
Code

DNN and CNN with Weighted and Multi-task Loss Functions for Audio Event Detection

no code implementations • 10 Aug 2017 • Huy Phan, Martin Krawczyk-Becker, Timo Gerkmann, Alfred Mertins

Our proposed systems significantly outperform the challenge baseline, improving F-score from 72. 7% to 90. 0% and reducing detection error rate from 0. 53 to 0. 18 on average on the development data.

Event Detection Task 2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.