Unsupervised Speech Domain Adaptation Based on Disentangled Representation Learning for Robust Speech Recognition

12 Apr 2019 Jong-Hyeon Park Myungwoo Oh Hyung-Min Park

In general, the performance of automatic speech recognition (ASR) systems is significantly degraded due to the mismatch between training and test environments. Recently, a deep-learning-based image-to-image translation technique to translate an image from a source domain to a desired domain was presented, and cycle-consistent adversarial network (CycleGAN) was applied to learn a mapping for speech-to-speech conversion from a speaker to a target speaker... (read more)

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Batch Normalization
Normalization
Residual Connection
Skip Connections
PatchGAN
Discriminators
ReLU
Activation Functions
Tanh Activation
Activation Functions
Residual Block
Skip Connection Blocks
Instance Normalization
Normalization
Convolution
Convolutions
Leaky ReLU
Activation Functions
Sigmoid Activation
Activation Functions
GAN Least Squares Loss
Loss Functions
Cycle Consistency Loss
Loss Functions
CycleGAN
Generative Models