1 code implementation • 2 Apr 2024 • Tanvir Mahmud, Saeed Amizadeh, Kazuhito Koishida, Diana Marculescu
Conditional sound separation in multi-source audio mixtures without having access to single source sound data during training is a long standing challenge.
1 code implementation • 14 Mar 2024 • Afrina Tabassum, Dung Tran, Trung Dang, Ismini Lourentzou, Kazuhito Koishida
Masked Autoencoders (MAEs) learn rich low-level representations from unlabeled data but require substantial labeled data to effectively adapt to downstream tasks.
no code implementations • 13 Feb 2024 • Chih-Yu Lai, Dung Tran, Kazuhito Koishida
Learned image compression has gained widespread popularity for their efficiency in achieving ultra-low bit-rates.
no code implementations • 20 Dec 2023 • Oscar Chang, Dung N. Tran, Kazuhito Koishida
Generalization remains a major problem in supervised learning of single-channel speech enhancement.
1 code implementation • 1 Nov 2023 • Amrit Romana, Kazuhito Koishida, Emily Mower Provost
We find that disfluency detection performance is largely limited by the quality of transcripts and alignments.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 19 Sep 2023 • Yatong Bai, Trung Dang, Dung Tran, Kazuhito Koishida, Somayeh Sojoudi
Diffusion models power a vast majority of text-to-audio (TTA) generation methods.
Ranked #10 on Audio Generation on AudioCaps
no code implementations • 26 Oct 2022 • Vasily Zadorozhnyy, Qiang Ye, Kazuhito Koishida
In recent years, Generative Adversarial Networks (GANs) have produced significantly improved results in speech enhancement (SE) tasks.
Ranked #2 on Speech Enhancement on VoiceBank + DEMAND
no code implementations • 21 Dec 2021 • Melikasadat Emami, Dung Tran, Kazuhito Koishida
Improving generalization is a major challenge in audio classification due to labeled data scarcity.
no code implementations • 9 Dec 2021 • Bahareh Tolooshams, Kazuhito Koishida
Deep learning-based speech enhancement has shown unprecedented performance in recent years.
no code implementations • 8 Dec 2021 • Trung Dang, Dung Tran, Peter Chin, Kazuhito Koishida
Unsupervised Zero-Shot Voice Conversion (VC) aims to modify the speaker characteristic of an utterance to match an unseen target speaker without relying on parallel training data.
2 code implementations • 6 Jan 2021 • Chandan K A Reddy, Harishchandra Dubey, Kazuhito Koishida, Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan
In this version of the challenge organized at INTERSPEECH 2021, we are expanding both our training and test datasets to accommodate full band scenarios.
no code implementations • ICML 2020 • Saeed Amizadeh, Hamid Palangi, Oleksandr Polozov, Yichen Huang, Kazuhito Koishida
To address this, we propose (1) a framework to isolate and evaluate the reasoning aspect of VQA separately from its perception, and (2) a novel top-down calibration technique that allows the model to answer reasoning questions even with imperfect perception.
1 code implementation • CVPR 2020 • Hamid Reza Vaezi Joze, Amirreza Shaban, Michael L. Iuzzolino, Kazuhito Koishida
In late fusion, each modality is processed in a separate unimodal Convolutional Neural Network (CNN) stream and the scores of each modality are fused at the end.
Ranked #3 on Hand Gesture Recognition on NVGesture
no code implementations • 4 Aug 2019 • Wei Xia, Kazuhito Koishida
In this study, we introduce a convolutional time-frequency-channel "Squeeze and Excitation" (tfc-SE) module to explicitly model inter-dependencies between the time-frequency domain and multiple channels.