Search Results for author: Kazuhito Koishida

Found 14 papers, 6 papers with code

Weakly-supervised Audio Separation via Bi-modal Semantic Similarity

1 code implementation • 2 Apr 2024 • Tanvir Mahmud, Saeed Amizadeh, Kazuhito Koishida, Diana Marculescu

Conditional sound separation in multi-source audio mixtures without having access to single source sound data during training is a long standing challenge.

Paper
Code

uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures

1 code implementation • 14 Mar 2024 • Afrina Tabassum, Dung Tran, Trung Dang, Ismini Lourentzou, Kazuhito Koishida

Masked Autoencoders (MAEs) learn rich low-level representations from unlabeled data but require substantial labeled data to effectively adapt to downstream tasks.

Paper
Code

Learned Image Compression with Text Quality Enhancement

no code implementations • 13 Feb 2024 • Chih-Yu Lai, Dung Tran, Kazuhito Koishida

Learned image compression has gained widespread popularity for their efficiency in achieving ultra-low bit-rates.

Image Compression

Paper
Add Code

Single-channel speech enhancement using learnable loss mixup

no code implementations • 20 Dec 2023 • Oscar Chang, Dung N. Tran, Kazuhito Koishida

Generalization remains a major problem in supervised learning of single-channel speech enhancement.

Speech Enhancement

Paper
Add Code

Automatic Disfluency Detection from Untranscribed Speech

1 code implementation • 1 Nov 2023 • Amrit Romana, Kazuhito Koishida, Emily Mower Provost

We find that disfluency detection performance is largely limited by the quality of transcripts and alignments.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

1 code implementation • 19 Sep 2023 • Yatong Bai, Trung Dang, Dung Tran, Kazuhito Koishida, Somayeh Sojoudi

Diffusion models power a vast majority of text-to-audio (TTA) generation methods.

Ranked #10 on Audio Generation on AudioCaps

AudioCaps Audio Generation +1

Paper
Code

SCP-GAN: Self-Correcting Discriminator Optimization for Training Consistency Preserving Metric GAN on Speech Enhancement Tasks

no code implementations • 26 Oct 2022 • Vasily Zadorozhnyy, Qiang Ye, Kazuhito Koishida

In recent years, Generative Adversarial Networks (GANs) have produced significantly improved results in speech enhancement (SE) tasks.

Ranked #2 on Speech Enhancement on VoiceBank + DEMAND

Speech Enhancement

Paper
Add Code

Augmented Contrastive Self-Supervised Learning for Audio Invariant Representations

no code implementations • 21 Dec 2021 • Melikasadat Emami, Dung Tran, Kazuhito Koishida

Improving generalization is a major challenge in audio classification due to labeled data scarcity.

Audio Classification Contrastive Learning +1

Paper
Add Code

A Training Framework for Stereo-Aware Speech Enhancement using Deep Neural Networks

no code implementations • 9 Dec 2021 • Bahareh Tolooshams, Kazuhito Koishida

Deep learning-based speech enhancement has shown unprecedented performance in recent years.

Speech Enhancement

Paper
Add Code

Training Robust Zero-Shot Voice Conversion Models with Self-supervised Features

no code implementations • 8 Dec 2021 • Trung Dang, Dung Tran, Peter Chin, Kazuhito Koishida

Unsupervised Zero-Shot Voice Conversion (VC) aims to modify the speaker characteristic of an utterance to match an unseen target speaker without relying on parallel training data.

Self-Supervised Learning Voice Conversion

Paper
Add Code

Interspeech 2021 Deep Noise Suppression Challenge

2 code implementations • 6 Jan 2021 • Chandan K A Reddy, Harishchandra Dubey, Kazuhito Koishida, Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan

In this version of the challenge organized at INTERSPEECH 2021, we are expanding both our training and test datasets to accommodate full band scenarios.

Denoising

973

Paper
Code

Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"

no code implementations • ICML 2020 • Saeed Amizadeh, Hamid Palangi, Oleksandr Polozov, Yichen Huang, Kazuhito Koishida

To address this, we propose (1) a framework to isolate and evaluate the reasoning aspect of VQA separately from its perception, and (2) a novel top-down calibration technique that allows the model to answer reasoning questions even with imperfect perception.

Graph Generation Question Answering +4

Paper
Add Code

MMTM: Multimodal Transfer Module for CNN Fusion

1 code implementation • CVPR 2020 • Hamid Reza Vaezi Joze, Amirreza Shaban, Michael L. Iuzzolino, Kazuhito Koishida

In late fusion, each modality is processed in a separate unimodal Convolutional Neural Network (CNN) stream and the scores of each modality are fused at the end.

Ranked #3 on Hand Gesture Recognition on NVGesture

Action Recognition In Videos Hand Gesture Recognition +3

101

Paper
Code

Sound Event Detection in Multichannel Audio using Convolutional Time-Frequency-Channel Squeeze and Excitation

no code implementations • 4 Aug 2019 • Wei Xia, Kazuhito Koishida

In this study, we introduce a convolutional time-frequency-channel "Squeeze and Excitation" (tfc-SE) module to explicitly model inter-dependencies between the time-frequency domain and multiple channels.

Event Detection Sound Event Detection

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.