no code implementations • 7 Jan 2024 • Darius Petermann, Minje Kim
In this work, we explore the task of hierarchical distance-based speech separation defined on a hyperbolic manifold.
no code implementations • 14 Mar 2023 • Darius Petermann, Inseon Jang, Minje Kim
Spectral sub-bands do not portray the same perceptual relevance.
no code implementations • 14 Dec 2022 • Darius Petermann, Gordon Wichern, Aswin Shanmugam Subramanian, Zhong-Qiu Wang, Jonathan Le Roux
In this paper, we focus on the cocktail fork problem, which takes a three-pronged approach to source separation by separating an audio mixture such as a movie soundtrack or podcast into the three broad categories of speech, music, and sound effects (SFX - understood to include ambient noise and natural sound events).
no code implementations • 9 Dec 2022 • Darius Petermann, Gordon Wichern, Aswin Subramanian, Jonathan Le Roux
We introduce a framework for audio source separation using embeddings on a hyperbolic manifold that compactly represent the hierarchical relationship between sound sources and time-frequency features.
no code implementations • 15 Feb 2022 • Darius Petermann, Minje Kim
With the recent advancements of data driven approaches using deep neural networks, music source separation has been formulated as an instrument-specific supervised problem.
3 code implementations • 19 Oct 2021 • Darius Petermann, Gordon Wichern, Zhong-Qiu Wang, Jonathan Le Roux
The cocktail party problem aims at isolating any source of interest within a complex acoustic scene, and has long inspired audio source separation research.
no code implementations • 22 Jul 2021 • Darius Petermann, SeungKwon Beack, Minje Kim
The assumption is that, in a mirrored autoencoder topology, a decoder layer reconstructs the intermediate feature representation of its corresponding encoder layer.
no code implementations • 17 Aug 2020 • Darius Petermann, Pritish Chandna, Helena Cuesta, Jordi Bonada, Emilia Gomez
However, most of the research has been focused on a typical case which consists in separating vocal, percussion and bass sources from a mixture, each of which has a distinct spectral structure.