no code implementations • 9 Feb 2024 • H. Nazim Bicer, Cagdas Tuna, Andreas Walther, Emanuël A. P. Habets
Room geometry inference algorithms rely on the localization of acoustic reflectors to identify boundary surfaces of an enclosure.
1 code implementation • 7 Sep 2023 • Zeyu Xu, Adrian Herzog, Alexander Lodermeyer, Emanuël A. P. Habets, Albert G. Prinn
The image source method (ISM) is often used to simulate room acoustics due to its ease of use and computational efficiency.
no code implementations • 28 Aug 2023 • Cagdas Tuna, Altan Akat, H. Nazim Bicer, Andreas Walther, Emanuël A. P. Habets
Motivated by the increasing popularity of commercially available soundbars, this article presents a data-driven 3D RGI method using RIRs measured from a linear loudspeaker array to a single microphone.
no code implementations • 23 Mar 2023 • Matteo Torcoli, Emanuël A. P. Habets
When dialogue and background sounds are not separately available from the production stage, Dialogue Separation (DS) can estimate them to enable personalization.
no code implementations • 15 Mar 2023 • Mohamed Elminshawi, Srikanth Raj Chetupalli, Emanuël A. P. Habets
By allowing for time-varying embeddings in the single-channel TSE block, the proposed method fully exploits the correspondence between the front-end beamformer output and the target speech in the microphone signal.
no code implementations • 13 Mar 2023 • Julian Wechsler, Srikanth Raj Chetupalli, Wolfgang Mack, Emanuël A. P. Habets
The network is trained to enforce a fixed mapping of regions to network outputs.
no code implementations • 22 Feb 2023 • Philipp Götz, Cagdas Tuna, Andreas Walther, Emanuël A. P. Habets
A study is presented in which a contrastive learning approach is used to extract low-dimensional representations of the acoustic environment from single-channel, reverberant speech signals.
no code implementations • 27 Dec 2022 • Thomas Robotham, Ashutosh Singla, Olli S. Rummukainen, Alexander Raake, Emanuël A. P. Habets
Research into multi-modal perception, human cognition, behavior, and attention can benefit from high-fidelity content that may recreate real-life-like scenes when rendered on head-mounted displays.
1 code implementation • 5 Aug 2022 • Philipp Götz, Cagdas Tuna, Andreas Walther, Emanuël A. P. Habets
A dataset of anechoic recordings of various sound sources encountered in domestic environments is presented.
no code implementations • 28 Jul 2022 • Matteo Torcoli, Thomas Robotham, Emanuël A. P. Habets
A physiological indicator of LE known from audiology studies is pupil size.
no code implementations • 28 Jun 2022 • Ahmad Aloradi, Wolfgang Mack, Mohamed Elminshawi, Emanuël A. P. Habets
Classical speaker verification (SV) approaches estimate a fixed-dimensional embedding from a speech utterance that encodes the speaker's voice characteristics.
no code implementations • 13 Jun 2022 • Adrian Herzog, Srikanth Raj Chetupalli, Emanuël A. P. Habets
Consider a multichannel Ambisonic recording containing a mixture of several reverberant speech signals.
no code implementations • 23 Feb 2022 • Philipp Götz, Cagdas Tuna, Andreas Walther, Emanuël A. P. Habets
The estimation of reverberation time from real-world signals plays a central role in a wide range of applications.
no code implementations • 1 Feb 2022 • Mohamed Elminshawi, Wolfgang Mack, Srikanth Raj Chetupalli, Soumitro Chakrabarty, Emanuël A. P. Habets
However, such studies have been conducted on a few datasets and have not considered recent deep neural network architectures for SS that have shown impressive separation performance.
no code implementations • 3 Jan 2022 • Wolfgang Mack, Julian Wechsler, Emanuël A. P. Habets
The impact of attention on DOA estimators and different training strategies for attention and DOA DNNs are not yet studied in depth.
no code implementations • 9 Nov 2020 • Shrishti Saha Shetu, Soumitro Chakrabarty, Emanuël A. P. Habets
Audio-visual speech enhancement (AVSE) methods use both audio and visual features for the task of speech enhancement and the use of visual features has been shown to be particularly effective in multi-speaker scenarios.
no code implementations • 9 Nov 2020 • Fabian Hübner, Wolfgang Mack, Emanuël A. P. Habets
By an evaluation using data from measured room impulse responses, we demonstrate that a model trained with the proposed training data generation method performs comparably to models trained with data generated based on the source-image method.
no code implementations • 9 Nov 2020 • Mohamed Elminshawi, Wolfgang Mack, Emanuël A. P. Habets
Recent deep learning-based methods leverage a speaker discriminative model that maps a reference snippet uttered by the target speaker into a single embedding vector that encapsulates the characteristics of the target speaker.