Search Results for author: Yuma Koizumi

Found 31 papers, 9 papers with code

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus

no code implementations • 30 May 2023 • Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna

The constituent samples of LibriTTS-R are identical to those of LibriTTS, with only the sound quality improved.

Paper
Add Code

Description and Discussion on DCASE 2023 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

1 code implementation • 13 May 2023 • Kota Dohi, Keisuke Imoto, Noboru Harada, Daisuke Niizumi, Yuma Koizumi, Tomoya Nishida, Harsh Purohit, Ryo Tanabe, Takashi Endo, Yohei Kawaguchi

In 2023 Task 2, we focus on solving the first-shot problem, which is the challenge of training a model on a completely novel machine type.

Task 2

Paper
Code

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations

1 code implementation • 3 Mar 2023 • Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Yu Zhang, Wei Han, Ankur Bapna, Michiel Bacchiani

Experiments show that Miipher (i) is robust against various audio degradation and (ii) enable us to train a high-quality text-to-speech (TTS) model from restored speech samples collected from the Web.

Speech Denoising Speech Enhancement

Paper
Code

WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration

no code implementations • 3 Oct 2022 • Yuma Koizumi, Kohei Yatabe, Heiga Zen, Michiel Bacchiani

The DDPMs and GANs can be characterized by the iterative denoising framework and adversarial training, respectively.

Denoising

Paper
Add Code

Description and Discussion on DCASE 2022 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques

2 code implementations • 13 Jun 2022 • Kota Dohi, Keisuke Imoto, Noboru Harada, Daisuke Niizumi, Yuma Koizumi, Tomoya Nishida, Harsh Purohit, Takashi Endo, Masaaki Yamamoto, Yohei Kawaguchi

We present the task description and discussion on the results of the DCASE 2022 Challenge Task 2: ``Unsupervised anomalous sound detection (ASD) for machine condition monitoring applying domain generalization techniques''.

domain classification Domain Generalization +1

Paper
Code

Mask scalar prediction for improving robust automatic speech recognition

no code implementations • 26 Apr 2022 • Arun Narayanan, James Walker, Sankaran Panchapagesan, Nathan Howard, Yuma Koizumi

Using neural network based acoustic frontends for improving robustness of streaming automatic speech recognition (ASR) systems is challenging because of the causality constraints and the resulting distortion that the frontend processing introduces in speech.

Acoustic echo cancellation Automatic Speech Recognition +2

Paper
Add Code

SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping

no code implementations • 31 Mar 2022 • Yuma Koizumi, Heiga Zen, Kohei Yatabe, Nanxin Chen, Michiel Bacchiani

Neural vocoder using denoising diffusion probabilistic model (DDPM) has been improved by adaptation of the diffusion noise distribution to given acoustic features.

Denoising Speech Enhancement

Paper
Add Code

SNRi Target Training for Joint Speech Enhancement and Recognition

no code implementations • 1 Nov 2021 • Yuma Koizumi, Shigeki Karita, Arun Narayanan, Sankaran Panchapagesan, Michiel Bacchiani

Furthermore, by analyzing the predicted target SNRi, we observed the jointly trained network automatically controls the target SNRi according to noise characteristics.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement

no code implementations • 30 Jun 2021 • Yuma Koizumi, Shigeki Karita, Scott Wisdom, Hakan Erdogan, John R. Hershey, Llion Jones, Michiel Bacchiani

To make the model computationally feasible, we extend the Conformer using linear complexity attention and stacked 1-D dilated depthwise convolution layers.

Computational Efficiency Denoising +1

Paper
Add Code

Description and Discussion on DCASE 2021 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring under Domain Shifted Conditions

4 code implementations • 8 Jun 2021 • Yohei Kawaguchi, Keisuke Imoto, Yuma Koizumi, Noboru Harada, Daisuke Niizumi, Kota Dohi, Ryo Tanabe, Harsh Purohit, Takashi Endo

In 2020, we organized an unsupervised anomalous sound detection (ASD) task, identifying whether a given sound was normal or anomalous without anomalous training data.

Task 2

Paper
Code

Sampling-Frequency-Independent Audio Source Separation Using Convolution Layer Based on Impulse Invariant Method

1 code implementation • 10 May 2021 • Koichi Saito, Tomohiko Nakamura, Kohei Yatabe, Yuma Koizumi, Hiroshi Saruwatari

Audio source separation is often used as preprocessing of various applications, and one of its ultimate goals is to construct a single versatile model capable of dealing with the varieties of audio signals.

Audio Source Separation Music Source Separation

Paper
Code

Noisy-target Training: A Training Strategy for DNN-based Speech Enhancement without Clean Speech

no code implementations • 21 Jan 2021 • Takuya Fujimura, Yuma Koizumi, Kohei Yatabe, Ryoichi Miyazaki

This requirement currently restricts the amount of training data for speech enhancement to less than 1/1000 of that of speech recognition which does not need clean signals.

Speech Enhancement speech-recognition +1

Paper
Add Code

Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval

no code implementations • 14 Dec 2020 • Yuma Koizumi, Yasunori Ohishi, Daisuke Niizumi, Daiki Takeuchi, Masahiro Yasuda

Then, the caption of the audio input is generated by using a pre-trained language model while referring to the guidance captions.

Audio captioning Language Modelling +1

Paper
Add Code

Effects of Word-frequency based Pre- and Post- Processings for Audio Captioning

no code implementations • 24 Sep 2020 • Daiki Takeuchi, Yuma Koizumi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

The system we used for Task 6 (Automated Audio Captioning)of the Detection and Classification of Acoustic Scenes and Events(DCASE) 2020 Challenge combines three elements, namely, dataaugmentation, multi-task learning, and post-processing, for audiocaptioning.

Audio captioning Data Augmentation +1

Paper
Add Code

The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation

no code implementations • 1 Jul 2020 • Yuma Koizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

This technical report describes the system participating to the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 Challenge, Task 6: automated audio captioning.

Ranked #4 on Audio captioning on Clotho

Audio captioning Caption Generation +2

Paper
Add Code

A Transformer-based Audio Captioning Model with Keyword Estimation

no code implementations • 1 Jul 2020 • Yuma Koizumi, Ryo Masumura, Kyosuke Nishida, Masahiro Yasuda, Shoichiro Saito

TRACKE estimates keywords, which comprise a word set corresponding to audio events/scenes in the input audio, and generates the caption while referring to the estimated keywords to reduce word-selection indeterminacy.

Acoustic Scene Classification Audio captioning +2

Paper
Add Code

Description and Discussion on DCASE2020 Challenge Task2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

3 code implementations • 10 Jun 2020 • Yuma Koizumi, Yohei Kawaguchi, Keisuke Imoto, Toshiki Nakamura, Yuki Nikaido, Ryo Tanabe, Harsh Purohit, Kaori Suefusa, Takashi Endo, Masahiro Yasuda, Noboru Harada

The main challenge of this task is to detect unknown anomalous sounds under the condition that only normal sound samples have been provided as training data.

Task 2

Paper
Code

Listen to What You Want: Neural Network-based Universal Sound Selector

no code implementations • 10 Jun 2020 • Tsubasa Ochiai, Marc Delcroix, Yuma Koizumi, Hiroaki Ito, Keisuke Kinoshita, Shoko Araki

In this paper, we propose instead a universal sound selection neural network that enables to directly select AE sounds from a mixture given user-specified target AE classes.

Paper
Add Code

Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention

no code implementations • 14 Feb 2020 • Yuma Koizumi, Kohei Yatabe, Marc Delcroix, Yoshiki Masuyama, Daiki Takeuchi

This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features; we extract a speaker representation used for adaptation directly from the test utterance.

Multi-Task Learning Speaker Identification +3

Paper
Add Code

Phase reconstruction based on recurrent phase unwrapping with deep neural networks

no code implementations • 14 Feb 2020 • Yoshiki Masuyama, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada

In the proposed method, DNNs estimate phase derivatives instead of phase itself, which allows us to avoid the sensitivity problem.

Audio Synthesis

Paper
Add Code

Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-Box Cost Function

no code implementations • 14 Feb 2020 • Masaki Kawanaka, Yuma Koizumi, Ryoichi Miyazaki, Kohei Yatabe

For evaluating the subjective quality, several methods related to perceptually-motivated objective sound quality assessment (OSQA) have been proposed such as PESQ (perceptual evaluation of speech quality).

Speech Enhancement

Paper
Add Code

Invertible DNN-based nonlinear time-frequency transform for speech enhancement

1 code implementation • 25 Nov 2019 • Daiki Takeuchi, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada

Therefore, some end-to-end methods used a DNN to learn the linear T-F transform which is much easier to understand.

Audio and Speech Processing Sound

Paper
Code

First Order Ambisonics Domain Spatial Augmentation for DNN-based Direction of Arrival Estimation

no code implementations • 10 Oct 2019 • Luca Mazzon, Yuma Koizumi, Masahiro Yasuda, Noboru Harada

The same transformation is applied also to the labels, in order to maintain consistency between input data and target labels.

Data Augmentation Direction of Arrival Estimation

Paper
Add Code

DOA Estimation by DNN-based Denoising and Dereverberation from Sound Intensity Vector

no code implementations • 10 Oct 2019 • Masahiro Yasuda, Yuma Koizumi, Luca Mazzon, Shoichiro Saito, Hisashi Uematsu

We propose a direction of arrival (DOA) estimation method that combines sound-intensity vector (IV)-based DOA estimation and DNN-based denoising and dereverberation.

Denoising

Paper
Add Code

ToyADMOS: A Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection

2 code implementations • 9 Aug 2019 • Yuma Koizumi, Shoichiro Saito, Hisashi Uematsu, Noboru Harada, Keisuke Imoto

To build a large-scale dataset for ADMOS, we collected anomalous operating sounds of miniature machines (toys) by deliberately damaging them.

Anomaly Detection

Paper
Code

Batch Uniformization for Minimizing Maximum Anomaly Score of DNN-based Anomaly Detection in Sounds

no code implementations • 19 Jul 2019 • Yuma Koizumi, Shoichiro Saito, Masataka Yamaguchi, Shin Murata, Noboru Harada

The AE is trained to minimize the sample mean of the anomaly score of normal sounds in a mini-batch.

Density Estimation Unsupervised Anomaly Detection

Paper
Add Code

Deep Griffin-Lim Iteration

no code implementations • 10 Mar 2019 • Yoshiki Masuyama, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada

This paper presents a novel phase reconstruction method (only from a given amplitude spectrogram) by combining a signal-processing-based approach and a deep neural network (DNN).

Paper
Add Code

AdaFlow: Domain-Adaptive Density Estimator with Application to Anomaly Detection and Unpaired Cross-Domain Translation

no code implementations • 14 Dec 2018 • Masataka Yamaguchi, Yuma Koizumi, Noboru Harada

To address this difficulty, we propose AdaFlow, a new DNN-based density estimator that can be easily adapted to the change of the distribution.

Density Estimation Translation +1

Paper
Add Code

Trainable Adaptive Window Switching for Speech Enhancement

no code implementations • 5 Nov 2018 • Yuma Koizumi, Noboru Harada, Yoichi Haneda

To overcome this problem, we incorporate AWS into the speech enhancement procedure, and the windowing function of each time-frame is manipulated using a DNN depending on the input signal.

Speech Enhancement

Paper
Add Code

DNN-based Source Enhancement to Increase Objective Sound Quality Assessment Score

no code implementations • 22 Oct 2018 • Yuma Koizumi, Kenta Niwa, Yusuke Hioka, Kazunori Kobayashi, Yoichi Haneda

Since OSQA scores have been used widely for sound-quality evaluation, constructing DNNs to increase OSQA scores would be better than using the minimum-MSE to create high-quality output signals.

Paper
Add Code

Unsupervised Detection of Anomalous Sound based on Deep Learning and the Neyman-Pearson Lemma

1 code implementation • 22 Oct 2018 • Yuma Koizumi, Shoichiro Saito, Hisashi Uematsum Yuta Kawachi, Noboru Harada

To calculate the TPR in the objective function, we consider that the set of anomalous sounds is the complementary set of normal sounds and simulate anomalous sounds by using a rejection sampling algorithm.

LEMMA Unsupervised Anomaly Detection +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.