TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Distant Speech Recognition	CHiME-4 real 6ch	HMM-TDNN(LFMMI) + LSTMLM + NN-GEV	Word Error Rate (WER)	2.74	# 2
Noisy Speech Recognition	CHiME real	HMM-TDNN(LFMMI) + LSTMLM	Percentage error	11.4	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/building-state-of-the-art-distant-speech/distant-speech-recognition-on-chime-4-real)](https://paperswithcode.com/sota/distant-speech-recognition-on-chime-4-real?p=building-state-of-the-art-distant-speech)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/building-state-of-the-art-distant-speech/noisy-speech-recognition-on-chime-real)](https://paperswithcode.com/sota/noisy-speech-recognition-on-chime-real?p=building-state-of-the-art-distant-speech)`

Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline

27 Mar 2018 · Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu, Shinji Watanabe ·

This paper describes a new baseline system for automatic speech recognition (ASR) in the CHiME-4 challenge to promote the development of noisy ASR in speech processing communities by providing 1) state-of-the-art system with a simplified single system comparable to the complicated top systems in the challenge, 2) publicly available and reproducible recipe through the main repository in the Kaldi speech recognition toolkit. The proposed system adopts generalized eigenvalue beamforming with bidirectional long short-term memory (LSTM) mask estimation. We also propose to use a time delay neural network (TDNN) based on the lattice-free version of the maximum mutual information (LF-MMI) trained with augmented all six microphones plus the enhanced data after beamforming. Finally, we use a LSTM language model for lattice and n-best re-scoring. The final system achieved 2.74\% WER for the real test set in the 6-channel track, which corresponds to the 2nd place in the challenge. In addition, the proposed baseline recipe includes four different speech enhancement measures, short-time objective intelligibility measure (STOI), extended STOI (eSTOI), perceptual evaluation of speech quality (PESQ) and speech distortion ratio (SDR) for the simulation test set. Thus, the recipe also provides an experimental platform for speech enhancement studies with these performance measures.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Automatic Speech Recognition

Automatic Speech Recognition (ASR)

Distant Speech Recognition

Language Modelling

Noisy Speech Recognition

Speech Enhancement

speech-recognition

Speech Recognition

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Edit

Ranked #2 on Noisy Speech Recognition on CHiME real

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Distant Speech Recognition	CHiME-4 real 6ch	HMM-TDNN(LFMMI) + LSTMLM + NN-GEV	Word Error Rate (WER)	2.74	# 2		Compare
Noisy Speech Recognition	CHiME real	HMM-TDNN(LFMMI) + LSTMLM	Percentage error	11.4	# 2		Compare

Methods

Add Remove

LSTM • Sigmoid Activation • Tanh Activation

Edit Social Preview

Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove