no code implementations • 14 Nov 2022 • Anastasia Kuznetsova, Aswin Sivaraman, Minje Kim
In the proposed method, we show that the quality of the NSS system's synthetic data matters, and if they are good enough the augmented dataset can be used to improve the PSE system that outperforms the speaker-agnostic baseline.
no code implementations • 8 May 2021 • Aswin Sivaraman, Minje Kim
To this end, we propose using an ensemble model wherein each specialist module denoises noisy utterances from a distinct partition of training set speakers.
no code implementations • 5 Apr 2021 • Aswin Sivaraman, Sunwoo Kim, Minje Kim
Training personalized speech enhancement models is innately a no-shot learning problem due to privacy constraints and limited access to noise-free speech from the target user.
no code implementations • 5 Apr 2021 • Aswin Sivaraman, Minje Kim
To this end, we pose personalization as either a zero-shot task, in which no additional clean speech of the target speaker is used for training, or a few-shot learning task, in which the goal is to minimize the duration of the clean speech used for transfer learning.
no code implementations • EACL 2021 • Sravana Reddy, Yongze Yu, Aasish Pappu, Aswin Sivaraman, Rezvaneh Rezapour, Rosie Jones
Podcast episodes often contain material extraneous to the main content, such as advertisements, interleaved within the audio and the written descriptions.
1 code implementation • 6 Nov 2020 • Aswin Sivaraman, Minje Kim
This work explores how self-supervised learning can be universally used to discover speaker-specific features towards enabling personalized speech enhancement models.
1 code implementation • 16 May 2020 • Aswin Sivaraman, Minje Kim
In this paper, we investigate a deep learning approach for speech denoising through an efficient ensemble of specialist neural networks.
no code implementations • 3 Feb 2019 • Sanna Wager, George Tzanetakis, Cheng-i Wang, Lijiang Guo, Aswin Sivaraman, Minje Kim
This approach differs from commercially used automatic pitch correction systems, where notes in the vocal tracks are shifted to be centered around notes in a user-defined score or mapped to the closest pitch among the twelve equal-tempered scale degrees.