Search Results for author: Hirokazu Kameoka

Found 32 papers, 14 papers with code

Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator

no code implementations • 25 Mar 2024 • Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka

A generative adversarial network (GAN)-based vocoder trained with an adversarial discriminator is commonly used for speech synthesis because of its fast, lightweight, and high-quality characteristics.

Data Augmentation Generative Adversarial Network +1

Paper
Add Code

iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN

no code implementations • 14 Aug 2023 • Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Shogo Seki

Owing to the difficulty of a 1D CNN to model high-dimensional spectrograms, the frequency dimension is reduced via temporal upsampling.

Speech Synthesis

Paper
Add Code

Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis

no code implementations • 24 Mar 2023 • Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Shogo Seki

This architecture provides a generator with sufficiently rich information for the synthesized speech to be closely matched to the real speech.

Generative Adversarial Network Speech Synthesis

Paper
Add Code

DisC-VC: Disentangled and F0-Controllable Neural Voice Conversion

no code implementations • 20 Oct 2022 • Chihiro Watanabe, Hirokazu Kameoka

In this paper, we propose a new variational-autoencoder-based voice conversion model accompanied by an auxiliary network, which ensures that the conversion result correctly reflects the specified F0/timbre information.

Voice Conversion

Paper
Add Code

Speak Like a Dog: Human to Non-human creature Voice Conversion

1 code implementation • 9 Jun 2022 • Kohei Suzuki, Shoki Sakamoto, Tadahiro Taniguchi, Hirokazu Kameoka

This paper proposes a new voice conversion (VC) task from human speech to dog-like speech while preserving linguistic information as an example of human to non-human creature voice conversion (H2NH-VC) tasks.

Generative Adversarial Network Voice Conversion

Paper
Code

iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform

1 code implementation • 4 Mar 2022 • Takuhiro Kaneko, Kou Tanaka, Hirokazu Kameoka, Shogo Seki

In recent text-to-speech synthesis and voice conversion systems, a mel-spectrogram is commonly applied as an intermediate representation, and the necessity for a mel-spectrogram vocoder is increasing.

Speech Synthesis Text-To-Speech Synthesis +1

208

Paper
Code

StarGAN-VC+ASR: StarGAN-based Non-Parallel Voice Conversion Regularized by Automatic Speech Recognition

no code implementations • 10 Aug 2021 • Shoki Sakamoto, Akira Taniguchi, Tadahiro Taniguchi, Hirokazu Kameoka

Although this method is powerful, it can fail to preserve the linguistic content of input speech when the number of available training samples is extremely small.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

StarGAN-based Emotional Voice Conversion for Japanese Phrases

no code implementations • 5 Apr 2021 • Asuka Moritani, Ryo Ozaki, Shoki Sakamoto, Hirokazu Kameoka, Tadahiro Taniguchi

Through subjective evaluation experiments, we evaluated the performance of our StarGAN-EVC system in terms of its ability to achieve EVC for Japanese phrases.

Voice Conversion

Paper
Add Code

MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in Frames

3 code implementations • 25 Feb 2021 • Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo

With FIF, we apply a temporal mask to the input mel-spectrogram and encourage the converter to fill in missing frames based on surrounding frames.

Voice Conversion

109

Paper
Code

CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion

2 code implementations • 22 Oct 2020 • Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo

To address this, we examined the applicability of CycleGAN-VC/VC2 to mel-spectrogram conversion.

Voice Conversion

134

Paper
Code

X-DC: Explainable Deep Clustering based on Learnable Spectrogram Templates

no code implementations • 18 Sep 2020 • Chihiro Watanabe, Hirokazu Kameoka

Particularly, it has been shown that a monaural speech separation task can be successfully solved with a DNN-based method called deep clustering (DC), which uses a DNN to describe the process of assigning a continuous vector to each time-frequency (TF) bin and measure how likely each pair of TF bins is to be dominated by the same speaker.

Clustering Deep Clustering +1

Paper
Add Code

Nonparallel Voice Conversion with Augmented Classifier Star Generative Adversarial Networks

1 code implementation • 27 Aug 2020 • Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo

We previously proposed a method that allows for nonparallel voice conversion (VC) by using a variant of generative adversarial networks (GANs) called StarGAN.

Voice Conversion

Paper
Code

Pretraining Techniques for Sequence-to-Sequence Voice Conversion

1 code implementation • 7 Aug 2020 • Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda

Sequence-to-sequence (seq2seq) voice conversion (VC) models are attractive owing to their ability to convert prosody.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Many-to-Many Voice Transformer Network

no code implementations • 18 May 2020 • Hirokazu Kameoka, Wen-Chin Huang, Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Tomoki Toda

The main idea we propose is an extension of the original VTN that can simultaneously learn mappings among multiple speakers.

Voice Conversion

Paper
Add Code

Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining

1 code implementation • 14 Dec 2019 • Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda

We introduce a novel sequence-to-sequence (seq2seq) voice conversion (VC) model based on the Transformer architecture with text-to-speech (TTS) pretraining.

Voice Conversion

Paper
Code

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

no code implementations • 5 Nov 2019 • Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Hector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sebastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika, Takashi Kaneda, Yuan Jiang, Li-Juan Liu, Yi-Chiao Wu, Wen-Chin Huang, Tomoki Toda, Kou Tanaka, Hirokazu Kameoka, Ingmar Steiner, Driss Matrouf, Jean-Francois Bonastre, Avashna Govender, Srikanth Ronanki, Jing-Xuan Zhang, Zhen-Hua Ling

Spoofing attacks within a logical access (LA) scenario are generated with the latest speech synthesis and voice conversion technologies, including state-of-the-art neural acoustic and waveform model techniques.

Person Recognition Speaker Verification +2

Paper
Add Code

StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion

3 code implementations • 29 Jul 2019 • Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo

To bridge this gap, we rethink conditional methods of StarGAN-VC, which are key components for achieving non-parallel multi-domain VC in a single model, and propose an improved variant called StarGAN-VC2.

Voice Conversion

144

Paper
Code

CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion

6 code implementations • 9 Apr 2019 • Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo

Non-parallel voice conversion (VC) is a technique for learning the mapping from source to target speech without relying on parallel data.

Voice Conversion

504

Paper
Code

Crossmodal Voice Conversion

no code implementations • 9 Apr 2019 • Hirokazu Kameoka, Kou Tanaka, Aaron Valero Puche, Yasunori Ohishi, Takuhiro Kaneko

We use the latent code of an input face image encoded by the face encoder as the auxiliary input into the speech converter and train the speech converter so that the original latent code can be recovered from the generated speech by the voice encoder.

Voice Conversion

Paper
Add Code

WaveCycleGAN2: Time-domain Neural Post-filter for Speech Waveform Generation

no code implementations • 5 Apr 2019 • Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, Nobukatsu Hojo

WaveCycleGAN has recently been proposed to bridge the gap between natural and synthesized speech waveforms in statistical parametric speech synthesis and provides fast inference with a moving average model rather than an autoregressive model and high-quality speech synthesis with the adversarial training.

Speech Synthesis

Paper
Add Code

Training a Neural Speech Waveform Model using Spectral Losses of Short-Time Fourier Transform and Continuous Wavelet Transform

no code implementations • 29 Mar 2019 • Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi

Recently, we proposed short-time Fourier transform (STFT)-based loss functions for training a neural speech waveform model.

Paper
Add Code

Fast MVAE: Joint separation and classification of mixed sources based on multichannel variational autoencoder with auxiliary classifier

no code implementations • 16 Dec 2018 • Li Li, Hirokazu Kameoka, Shoji Makino

While MVAE is notable in its impressive source separation performance, the convergence-guaranteed optimization algorithm and that it allows us to estimate source-class labels simultaneously with source separation, there are still two major drawbacks, i. e., the high computational complexity and unsatisfactory source classification accuracy.

Classification General Classification

Paper
Add Code

AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms

no code implementations • 9 Nov 2018 • Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, Nobukatsu Hojo

This paper describes a method based on a sequence-to-sequence learning (Seq2Seq) with attention and context preservation mechanism for voice conversion (VC) tasks.

Image Captioning Machine Translation +4

Paper
Add Code

ConvS2S-VC: Fully convolutional sequence-to-sequence voice conversion

no code implementations • 5 Nov 2018 • Hirokazu Kameoka, Kou Tanaka, Damian Kwasny, Takuhiro Kaneko, Nobukatsu Hojo

Second, it achieves many-to-many conversion by simultaneously learning mappings among multiple speakers using only a single model instead of separately learning mappings between each speaker pair using a different model.

Speech Enhancement Voice Conversion

Paper
Add Code

Generalized Multichannel Variational Autoencoder for Underdetermined Source Separation

no code implementations • 29 Sep 2018 • Shogo Seki, Hirokazu Kameoka, Li Li, Tomoki Toda, Kazuya Takeda

This paper deals with a multichannel audio source separation problem under underdetermined conditions.

Audio Source Separation

Paper
Add Code

WaveCycleGAN: Synthetic-to-natural speech waveform conversion using cycle-consistent adversarial networks

no code implementations • 25 Sep 2018 • Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Hirokazu Kameoka

The experimental results demonstrate that our proposed method can 1) alleviate the over-smoothing effect of the acoustic features despite the direct modification method used for the waveform and 2) greatly improve the naturalness of the generated speech sounds.

Speech Synthesis Voice Conversion

Paper
Add Code

ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary classifier variational autoencoder

2 code implementations • 13 Aug 2018 • Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo

Such situations can be avoided by introducing an auxiliary classifier and training the encoder and decoder so that the attribute classes of the decoder outputs are correctly predicted by the classifier.

Attribute Voice Conversion

Paper
Code

Semi-blind source separation with multichannel variational autoencoder

1 code implementation • 2 Aug 2018 • Hirokazu Kameoka, Li Li, Shota Inoue, Shoji Makino

This paper proposes a multichannel source separation technique called the multichannel variational autoencoder (MVAE) method, which uses a conditional VAE (CVAE) to model and estimate the power spectrograms of the sources in a mixture.

blind source separation

Paper
Code

StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks

13 code implementations • 6 Jun 2018 • Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo

This paper proposes a method that allows non-parallel many-to-many voice conversion (VC) by using a variant of a generative adversarial network (GAN) called StarGAN.

Attribute Generative Adversarial Network +1

504

Paper
Code

Generative adversarial network-based approach to signal reconstruction from magnitude spectrograms

no code implementations • 6 Apr 2018 • Keisuke Oyamada, Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo, Hiroyasu Ando

In this paper, we address the problem of reconstructing a time-domain signal (or a phase spectrogram) solely from a magnitude spectrogram.

Generative Adversarial Network

Paper
Add Code

Speech waveform synthesis from MFCC sequences with generative adversarial networks

1 code implementation • 3 Apr 2018 • Lauri Juvela, Bajibabu Bollepalli, Xin Wang, Hirokazu Kameoka, Manu Airaksinen, Junichi Yamagishi, Paavo Alku

This paper proposes a method for generating speech from filterbank mel frequency cepstral coefficients (MFCC), which are widely used in speech applications, such as ASR, but are generally considered unusable for speech synthesis.

Generative Adversarial Network Speech Synthesis

Paper
Code

Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks

9 code implementations • 30 Nov 2017 • Takuhiro Kaneko, Hirokazu Kameoka

A subjective evaluation showed that the quality of the converted speech was comparable to that obtained with a Gaussian mixture model-based method under advantageous conditions with parallel and twice the amount of data.

Voice Conversion

504

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.