Search Results for author: Hainan Xu

Found 16 papers, 6 papers with code

Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition

no code implementations • 4 Apr 2024 • Hainan Xu, Zhehuai Chen, Fei Jia, Boris Ginsburg

This paper proposes Transducers with Pronunciation-aware Embeddings (PET).

Automatic Speech Recognition speech-recognition +1

Paper
Add Code

TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer

no code implementations • 20 Mar 2024 • Yu Xi, Hao Li, Baochen Yang, Haoyu Li, Hainan Xu, Kai Yu

Designing an efficient keyword spotting (KWS) system that delivers exceptional performance on resource-constrained edge devices has long been a subject of significant attention.

Keyword Spotting

Paper
Add Code

Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition

1 code implementation • 26 Sep 2023 • Dongji Gao, Hainan Xu, Desh Raj, Leibny Paola Garcia Perera, Daniel Povey, Sanjeev Khudanpur

Training automatic speech recognition (ASR) systems requires large amounts of well-curated paired data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

774

Paper
Code

Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

no code implementations • 1 Jun 2023 • Dongji Gao, Matthew Wiesner, Hainan Xu, Leibny Paola Garcia, Daniel Povey, Sanjeev Khudanpur

Imperfectly transcribed speech is a prevalent issue in human-annotated speech corpora, which degrades the performance of ASR models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations

1 code implementation • 13 Apr 2023 • Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, Boris Ginsburg

TDT models for Speech Recognition achieve better accuracy and up to 2. 82X faster inference than conventional Transducers.

Ranked #1 on Speech Recognition on facebook/multilingual_librispeech german

Intent Classification Intent Classification and Slot Filling +3

10,073

Paper
Code

Multi-blank Transducers for Speech Recognition

1 code implementation • 4 Nov 2022 • Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, Boris Ginsburg

This paper proposes a modification to RNN-Transducer (RNN-T) models for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

10,073

Paper
Code

Espresso: A Fast End-to-end Neural Speech Recognition Toolkit

1 code implementation • 18 Sep 2019 • Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur

We present Espresso, an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq.

Ranked #1 on Speech Recognition on Hub5'00 CallHome

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

941

Paper
Code

Robust Document Representations for Cross-Lingual Information Retrieval in Low-Resource Settings

no code implementations • WS 2019 • Mahsa Yarmohammadi, Xutai Ma, Sorami Hisamoto, Muhammad Rahman, Yiming Wang, Hainan Xu, Daniel Povey, Philipp Koehn, Kevin Duh

Cross-Lingual Information Retrieval Retrieval

Paper
Add Code

Saliency-driven Word Alignment Interpretation for Neural Machine Translation

1 code implementation • WS 2019 • Shuoyang Ding, Hainan Xu, Philipp Koehn

Despite their original goal to jointly learn to align and translate, Neural Machine Translation (NMT) models, especially Transformer, are often perceived as not learning interpretable word alignments.

Machine Translation NMT +2

Paper
Code

Improving End-to-end Speech Recognition with Pronunciation-assisted Sub-word Modeling

no code implementations • 10 Nov 2018 • Hainan Xu, Shuoyang Ding, Shinji Watanabe

Most end-to-end speech recognition systems model text directly as a sequence of characters or sub-words.

Automatic Speech Recognition (ASR) speech-recognition

Paper
Add Code

The JHU Parallel Corpus Filtering Systems for WMT 2018

no code implementations • WS 2018 • Huda Khayrallah, Hainan Xu, Philipp Koehn

This work describes our submission to the WMT18 Parallel Corpus Filtering shared task.

Language Modelling Machine Translation +2

Paper
Add Code

Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks

1 code implementation • Interspeech 2018 2018 • Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi, Sanjeev Khudanpur

Time Delay Neural Networks (TDNNs), also known as onedimensional Convolutional Neural Networks (1-d CNNs), are an efficient and well-performing neural network architecture for speech recognition.

speech-recognition Speech Recognition

143

Paper
Code

Neural Network Language Modeling with Letter-based Features and Importance Sampling

no code implementations • ICASSP 2018 • Hainan Xu, Ke Li, Yiming Wang, Jian Wang, Shiyin Kang, Xie Chen, Daniel Povey, Sanjeev Khudanpur

In this paper we describe an extension of the Kaldi software toolkit to support neural-based language modeling, intended for use in automatic speech recognition (ASR) and related tasks.

Ranked #36 on Speech Recognition on LibriSpeech test-other (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A GPU-based WFST Decoder with Exact Lattice Generation

no code implementations • 9 Apr 2018 • Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Daniel Povey, Sanjeev Khudanpur

We describe initial work on an extension of the Kaldi toolkit that supports weighted finite-state transducer (WFST) decoding on Graphics Processing Units (GPUs).

Scheduling

Paper
Add Code

Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline

no code implementations • 27 Mar 2018 • Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu, Shinji Watanabe

This paper describes a new baseline system for automatic speech recognition (ASR) in the CHiME-4 challenge to promote the development of noisy ASR in speech processing communities by providing 1) state-of-the-art system with a simplified single system comparable to the complicated top systems in the challenge, 2) publicly available and reproducible recipe through the main repository in the Kaldi speech recognition toolkit.

Ranked #2 on Noisy Speech Recognition on CHiME real

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

Zipporah: a Fast and Scalable Data Cleaning System for Noisy Web-Crawled Parallel Corpora

no code implementations • EMNLP 2017 • Hainan Xu, Philipp Koehn

We introduce Zipporah, a fast and scalable data cleaning system.

Language Modelling Machine Translation +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.