Search Results for author: Kazuhito Koishida

Found 14 papers, 6 papers with code

Weakly-supervised Audio Separation via Bi-modal Semantic Similarity

1 code implementation2 Apr 2024 Tanvir Mahmud, Saeed Amizadeh, Kazuhito Koishida, Diana Marculescu

Conditional sound separation in multi-source audio mixtures without having access to single source sound data during training is a long standing challenge.

Semantic Similarity Semantic Textual Similarity

uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures

1 code implementation14 Mar 2024 Afrina Tabassum, Dung Tran, Trung Dang, Ismini Lourentzou, Kazuhito Koishida

Masked Autoencoders (MAEs) learn rich low-level representations from unlabeled data but require substantial labeled data to effectively adapt to downstream tasks.

Learned Image Compression with Text Quality Enhancement

no code implementations13 Feb 2024 Chih-Yu Lai, Dung Tran, Kazuhito Koishida

Learned image compression has gained widespread popularity for their efficiency in achieving ultra-low bit-rates.

Image Compression

Single-channel speech enhancement using learnable loss mixup

no code implementations20 Dec 2023 Oscar Chang, Dung N. Tran, Kazuhito Koishida

Generalization remains a major problem in supervised learning of single-channel speech enhancement.

Speech Enhancement

Training Robust Zero-Shot Voice Conversion Models with Self-supervised Features

no code implementations8 Dec 2021 Trung Dang, Dung Tran, Peter Chin, Kazuhito Koishida

Unsupervised Zero-Shot Voice Conversion (VC) aims to modify the speaker characteristic of an utterance to match an unseen target speaker without relying on parallel training data.

Self-Supervised Learning Voice Conversion

Interspeech 2021 Deep Noise Suppression Challenge

2 code implementations6 Jan 2021 Chandan K A Reddy, Harishchandra Dubey, Kazuhito Koishida, Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan

In this version of the challenge organized at INTERSPEECH 2021, we are expanding both our training and test datasets to accommodate full band scenarios.

Denoising

Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"

no code implementations ICML 2020 Saeed Amizadeh, Hamid Palangi, Oleksandr Polozov, Yichen Huang, Kazuhito Koishida

To address this, we propose (1) a framework to isolate and evaluate the reasoning aspect of VQA separately from its perception, and (2) a novel top-down calibration technique that allows the model to answer reasoning questions even with imperfect perception.

Graph Generation Question Answering +4

MMTM: Multimodal Transfer Module for CNN Fusion

1 code implementation CVPR 2020 Hamid Reza Vaezi Joze, Amirreza Shaban, Michael L. Iuzzolino, Kazuhito Koishida

In late fusion, each modality is processed in a separate unimodal Convolutional Neural Network (CNN) stream and the scores of each modality are fused at the end.

Action Recognition In Videos Hand Gesture Recognition +3

Sound Event Detection in Multichannel Audio using Convolutional Time-Frequency-Channel Squeeze and Excitation

no code implementations4 Aug 2019 Wei Xia, Kazuhito Koishida

In this study, we introduce a convolutional time-frequency-channel "Squeeze and Excitation" (tfc-SE) module to explicitly model inter-dependencies between the time-frequency domain and multiple channels.

Event Detection Sound Event Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.