Search Results for author: Shusuke Takahashi

Found 19 papers, 10 papers with code

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

no code implementations • 23 May 2024 • Shiqi Yang, Zhi Zhong, Mengjie Zhao, Shusuke Takahashi, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji

The recent audio-visual generation methods usually resort to huge large language model or composable diffusion models.

Audio Generation Denoising +2

Paper
Add Code

The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track

1 code implementation • 14 Aug 2023 • Stefan Uhlich, Giorgio Fabbro, Masato Hirano, Shusuke Takahashi, Gordon Wichern, Jonathan Le Roux, Dipam Chakraborty, Sharada Mohanty, Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Mikhail Sukhovei, Yuki Mitsufuji

A significant source of this improvement was making the simulated data better match real cinematic audio, which we further investigate in detail.

Paper
Code

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

1 code implementation • NeurIPS 2023 • Kazuki Shimada, Archontis Politis, Parthasaarathy Sudarsanam, Daniel Krause, Kengo Uchida, Sharath Adavanne, Aapo Hakala, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen, Yuki Mitsufuji

While direction of arrival (DOA) of sound events is generally estimated from multichannel audio data recorded in a microphone array, sound events usually derive from visually perceptible source objects, e. g., sounds of footsteps come from the feet of a walker.

Sound Event Localization and Detection

Paper
Code

Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders

no code implementations • 18 May 2023 • Hao Shi, Kazuki Shimada, Masato Hirano, Takashi Shibuya, Yuichiro Koyama, Zhi Zhong, Shusuke Takahashi, Tatsuya Kawahara, Yuki Mitsufuji

At the decoded feature level, we fuse the two decoded features by generative and predictive decoders.

Decoder Speech Enhancement

Paper
Add Code

The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation

1 code implementation • 13 May 2023 • Ryosuke Sawata, Naoya Takahashi, Stefan Uhlich, Shusuke Takahashi, Yuki Mitsufuji

We modify the target network, i. e., the network architecture of the original DNN-based MSS, by adding bridging paths for each output instrument to share their information.

Music Source Separation

2,140

Paper
Code

Diffusion-based Signal Refiner for Speech Separation

no code implementations • 10 May 2023 • Masato Hirano, Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, Yuki Mitsufuji

We experimentally show that our refiner can provide a clearer harmonic structure of speech and improves the reference-free metric of perceptual quality for arbitrary preceding model architectures.

Denoising Speech Enhancement +1

Paper
Add Code

Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement

1 code implementation • 27 Oct 2022 • Ryosuke Sawata, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji

Although deep neural network (DNN)-based speech enhancement (SE) methods outperform the previous non-DNN-based ones, they often degrade the perceptual quality of generated outputs.

Denoising Speech Enhancement

Paper
Code

DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability

no code implementations • 11 Oct 2022 • Kin Wai Cheuk, Ryosuke Sawata, Toshimitsu Uesaka, Naoki Murata, Naoya Takahashi, Shusuke Takahashi, Dorien Herremans, Yuki Mitsufuji

In this paper we propose a novel generative approach, DiffRoll, to tackle automatic music transcription (AMT).

Music Transcription

Paper
Add Code

STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

2 code implementations • 4 Jun 2022 • Archontis Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, Tuomas Virtanen

Additionally, the report presents the baseline system that accompanies the dataset in the challenge with emphasis on the differences with the baseline of the previous iterations; namely, introduction of the multi-ACCDOA representation to handle multiple simultaneous occurences of events of the same class, and support for additional improved input features for the microphone array format.

Ranked #1 on Sound Event Localization and Detection on STARSS22

Sound Event Localization and Detection

Paper
Code

SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

1 code implementation • 16 May 2022 • Yuhta Takida, Takashi Shibuya, WeiHsiang Liao, Chieh-Hsin Lai, Junki Ohmura, Toshimitsu Uesaka, Naoki Murata, Shusuke Takahashi, Toshiyuki Kumakura, Yuki Mitsufuji

In this paper, we propose a new training scheme that extends the standard VAE via novel stochastic dequantization and quantization, called stochastically quantized variational autoencoder (SQ-VAE).

Quantization

167

Paper
Code

Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training

2 code implementations • 14 Oct 2021 • Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, Naoya Takahashi, Emiru Tsunoo, Yuki Mitsufuji

The multi- ACCDOA format (a class- and track-wise output format) enables the model to solve the cases with overlaps from the same class.

Sound Event Localization and Detection

Paper
Code

Spatial mixup: Directional loudness modification as data augmentation for sound event localization and detection

1 code implementation • 12 Oct 2021 • Ricardo Falcon-Perez, Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, Yuki Mitsufuji

Data augmentation methods have shown great importance in diverse supervised learning problems where labeled data is scarce or costly to obtain.

Data Augmentation Sound Event Localization and Detection

Paper
Code

Improving Character Error Rate Is Not Equal to Having Clean Speech: Speech Enhancement for ASR Systems with Black-box Acoustic Models

no code implementations • 12 Oct 2021 • Ryosuke Sawata, Yosuke Kashiwagi, Shusuke Takahashi

In order to optimize the DNN-based SE model in terms of the character error rate (CER), which is one of the metric to evaluate the ASR system and generally non-differentiable, our method uses two DNNs: one for speech processing and one for mimicking the output CERs derived through an acoustic model (AM).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection

no code implementations • 21 Jun 2021 • Kazuki Shimada, Naoya Takahashi, Yuichiro Koyama, Shusuke Takahashi, Emiru Tsunoo, Masafumi Takahashi, Yuki Mitsufuji

This report describes our systems submitted to the DCASE2021 challenge task 3: sound event localization and detection (SELD) with directional interference.

Data Augmentation Sound Event Localization and Detection

Paper
Add Code

Manifold-Aware Deep Clustering: Maximizing Angles between Embedding Vectors Based on Regular Simplex

no code implementations • 4 Jun 2021 • Keitaro Tanaka, Ryosuke Sawata, Shusuke Takahashi

This paper presents a new deep clustering (DC) method called manifold-aware DC (M-DC) that can enhance hyperspace utilization more effectively than the original DC.

Clustering Deep Clustering

Paper
Add Code

Preventing Oversmoothing in VAE via Generalized Variance Parameterization

no code implementations • 17 Feb 2021 • Yuhta Takida, Wei-Hsiang Liao, Chieh-Hsin Lai, Toshimitsu Uesaka, Shusuke Takahashi, Yuki Mitsufuji

Variational autoencoders (VAEs) often suffer from posterior collapse, which is a phenomenon in which the learned latent space becomes uninformative.

Decoder

Paper
Add Code

AR-ELBO: Preventing Posterior Collapse Induced by Oversmoothing in Gaussian VAE

no code implementations • 1 Jan 2021 • Yuhta Takida, Wei-Hsiang Liao, Toshimitsu Uesaka, Shusuke Takahashi, Yuki Mitsufuji

Variational autoencoders (VAEs) often suffer from posterior collapse, which is a phenomenon that the learned latent space becomes uninformative.

Paper
Add Code

ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection

2 code implementations • 29 Oct 2020 • Kazuki Shimada, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji

Conventional NN-based methods use two branches for a sound event detection (SED) target and a direction-of-arrival (DOA) target.

Event Detection Sound Event Detection +1

Paper
Code

All for One and One for All: Improving Music Separation by Bridging Networks

5 code implementations • 8 Oct 2020 • Ryosuke Sawata, Stefan Uhlich, Shusuke Takahashi, Yuki Mitsufuji

This paper proposes several improvements for music separation with deep neural networks (DNNs), namely a multi-domain loss (MDL) and two combination schemes.

Ranked #21 on Music Source Separation on MUSDB18

Music Source Separation

2,140

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.