Search Results for author: Hainan Xu

Found 16 papers, 6 papers with code

TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer

no code implementations20 Mar 2024 Yu Xi, Hao Li, Baochen Yang, Haoyu Li, Hainan Xu, Kai Yu

Designing an efficient keyword spotting (KWS) system that delivers exceptional performance on resource-constrained edge devices has long been a subject of significant attention.

Keyword Spotting

Espresso: A Fast End-to-end Neural Speech Recognition Toolkit

1 code implementation18 Sep 2019 Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur

We present Espresso, an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Saliency-driven Word Alignment Interpretation for Neural Machine Translation

1 code implementation WS 2019 Shuoyang Ding, Hainan Xu, Philipp Koehn

Despite their original goal to jointly learn to align and translate, Neural Machine Translation (NMT) models, especially Transformer, are often perceived as not learning interpretable word alignments.

Machine Translation NMT +2

Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks

1 code implementation Interspeech 2018 2018 Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi, Sanjeev Khudanpur

Time Delay Neural Networks (TDNNs), also known as onedimensional Convolutional Neural Networks (1-d CNNs), are an efficient and well-performing neural network architecture for speech recognition.

speech-recognition Speech Recognition

Neural Network Language Modeling with Letter-based Features and Importance Sampling

no code implementations ICASSP 2018 Hainan Xu, Ke Li, Yiming Wang, Jian Wang, Shiyin Kang, Xie Chen, Daniel Povey, Sanjeev Khudanpur

In this paper we describe an extension of the Kaldi software toolkit to support neural-based language modeling, intended for use in automatic speech recognition (ASR) and related tasks.

Ranked #36 on Speech Recognition on LibriSpeech test-other (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

A GPU-based WFST Decoder with Exact Lattice Generation

no code implementations9 Apr 2018 Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Daniel Povey, Sanjeev Khudanpur

We describe initial work on an extension of the Kaldi toolkit that supports weighted finite-state transducer (WFST) decoding on Graphics Processing Units (GPUs).

Scheduling

Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline

no code implementations27 Mar 2018 Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu, Shinji Watanabe

This paper describes a new baseline system for automatic speech recognition (ASR) in the CHiME-4 challenge to promote the development of noisy ASR in speech processing communities by providing 1) state-of-the-art system with a simplified single system comparable to the complicated top systems in the challenge, 2) publicly available and reproducible recipe through the main repository in the Kaldi speech recognition toolkit.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Cannot find the paper you are looking for? You can Submit a new open access paper.