Search Results for author: Daniel Povey

Found 37 papers, 24 papers with code

On Speaker Attribution with SURT

1 code implementation • 28 Jan 2024 • Desh Raj, Matthew Wiesner, Matthew Maciejewski, Leibny Paola Garcia-Perera, Daniel Povey, Sanjeev Khudanpur

The Streaming Unmixing and Recognition Transducer (SURT) has recently become a popular framework for continuous, streaming, multi-talker speech recognition (ASR).

speech-recognition Speech Recognition

789

Paper
Code

Zipformer: A faster and better encoder for automatic speech recognition

1 code implementation • 17 Oct 2023 • Zengwei Yao, Liyong Guo, Xiaoyu Yang, Wei Kang, Fangjun Kuang, Yifan Yang, Zengrui Jin, Long Lin, Daniel Povey

The Conformer has become the most popular encoder model for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

789

Paper
Code

Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition

1 code implementation • 26 Sep 2023 • Dongji Gao, Hainan Xu, Desh Raj, Leibny Paola Garcia Perera, Daniel Povey, Sanjeev Khudanpur

Training automatic speech recognition (ASR) systems requires large amounts of well-curated paired data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

789

Paper
Code

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

1 code implementation • 15 Sep 2023 • Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Yifan Yang, Liyong Guo, Long Lin, Daniel Povey

In this paper, we introduce Libriheavy, a large-scale ASR corpus consisting of 50, 000 hours of read English speech derived from LibriVox.

137

Paper
Code

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS

1 code implementation • 14 Sep 2023 • Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen

Self-supervised learning (SSL) proficiency in speech-related tasks has driven research into utilizing discrete tokens for speech tasks like recognition and translation, which offer lower storage requirements and great potential to employ natural language processing techniques.

Self-Supervised Learning speech-recognition +2

789

Paper
Code

PromptASR for contextualized ASR with controllable style

2 code implementations • 14 Sep 2023 • Xiaoyu Yang, Wei Kang, Zengwei Yao, Yifan Yang, Liyong Guo, Fangjun Kuang, Long Lin, Daniel Povey

An additional style prompt can be given to the text encoder and guide the ASR system to output different styles of transcriptions.

Automatic Speech Recognition speech-recognition +1

789

Paper
Code

Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition

no code implementations • 12 Aug 2023 • Han Zhu, Dongji Gao, Gaofeng Cheng, Daniel Povey, Pengyuan Zhang, Yonghong Yan

Firstly, a generalized CTC loss function is introduced to handle noisy pseudo-labels by accepting alternative tokens in the positions of incorrect tokens.

Automatic Speech Recognition speech-recognition +1

Paper
Add Code

SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition

1 code implementation • 18 Jun 2023 • Desh Raj, Daniel Povey, Sanjeev Khudanpur

The Streaming Unmixing and Recognition Transducer (SURT) model was proposed recently as an end-to-end approach for continuous, streaming, multi-talker speech recognition (ASR).

Decoder Domain Adaptation +2

789

Paper
Code

Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

no code implementations • 1 Jun 2023 • Dongji Gao, Matthew Wiesner, Hainan Xu, Leibny Paola Garcia, Daniel Povey, Sanjeev Khudanpur

Imperfectly transcribed speech is a prevalent issue in human-annotated speech corpora, which degrades the performance of ASR models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Delay-penalized CTC implemented based on Finite State Transducer

1 code implementation • 19 May 2023 • Zengwei Yao, Wei Kang, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Yifan Yang, Long Lin, Daniel Povey

Our work is open-sourced and publicly available https://github. com/k2-fsa/k2.

Attribute

1,046

Paper
Code

Blank-regularized CTC for Frame Skipping in Neural Transducer

1 code implementation • 19 May 2023 • Yifan Yang, Xiaoyu Yang, Liyong Guo, Zengwei Yao, Wei Kang, Fangjun Kuang, Long Lin, Xie Chen, Daniel Povey

Neural Transducer and connectionist temporal classification (CTC) are popular end-to-end automatic speech recognition systems.

Automatic Speech Recognition speech-recognition +1

789

Paper
Code

GPU-accelerated Guided Source Separation for Meeting Transcription

2 code implementations • 10 Dec 2022 • Desh Raj, Daniel Povey, Sanjeev Khudanpur

In this paper, we describe our improved implementation of GSS that leverages the power of modern GPU-based pipelines, including batched processing of frequencies and segments, to provide 300x speed-up over CPU-based inference.

Ranked #2 on Speech Recognition on LibriCSS

blind source separation Target Speaker Extraction

Paper
Code

Fast and parallel decoding for transducer

1 code implementation • 31 Oct 2022 • Wei Kang, Liyong Guo, Fangjun Kuang, Long Lin, Mingshuang Luo, Zengwei Yao, Xiaoyu Yang, Piotr Żelasko, Daniel Povey

In this work, we introduce a constrained version of transducer loss to learn strictly monotonic alignments between the sequences; we also improve the standard greedy search and beam search algorithms by limiting the number of symbols that can be emitted per time step in transducer decoding, making it more efficient to decode in parallel with batches.

speech-recognition Speech Recognition

789

Paper
Code

Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation

1 code implementation • 31 Oct 2022 • Liyong Guo, Xiaoyu Yang, Quandong Wang, Yuxiang Kong, Zengwei Yao, Fan Cui, Fangjun Kuang, Wei Kang, Long Lin, Mingshuang Luo, Piotr Zelasko, Daniel Povey

Although on-the-fly teacher label generation tackles this issue, the training speed is significantly slower as the teacher model has to be evaluated every batch.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

789

Paper
Code

Delay-penalized transducer for low-latency streaming ASR

1 code implementation • 31 Oct 2022 • Wei Kang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Long Lin, Piotr Żelasko, Daniel Povey

In streaming automatic speech recognition (ASR), it is desirable to reduce latency as much as possible while having minimum impact on recognition accuracy.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

1,046

Paper
Code

Pruned RNN-T for fast, memory-efficient ASR training

no code implementations • 23 Jun 2022 • Fangjun Kuang, Liyong Guo, Wei Kang, Long Lin, Mingshuang Luo, Zengwei Yao, Daniel Povey

The RNN-Transducer (RNN-T) framework for speech recognition has been growing in popularity, particularly for deployed real-time ASR systems, because it combines high accuracy with naturally streaming recognition.

Decoder speech-recognition +1

Paper
Add Code

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

2 code implementations • 13 Jun 2021 • Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan

This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10, 000 hours of high quality labeled audio suitable for supervised training, and 40, 000 hours of total audio suitable for semi-supervised and unsupervised training.

Ranked #1 on Speech Recognition on GigaSpeech

Sentence speech-recognition +1

604

Paper
Code

speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment

2 code implementations • 3 Apr 2021 • Junbo Zhang, Zhiwen Zhang, Yongqing Wang, Zhiyong Yan, Qiong Song, YuKai Huang, Ke Li, Daniel Povey, Yujun Wang

This paper introduces a new open-source speech corpus named "speechocean762" designed for pronunciation assessment use, consisting of 5000 English utterances from 250 non-native speakers, where half of the speakers are children.

Ranked #7 on Phone-level pronunciation scoring on speechocean762

Phone-level pronunciation scoring Sentence +1

13,770

Paper
Code

A Parallelizable Lattice Rescoring Strategy with Neural Language Models

1 code implementation • 8 Mar 2021 • Ke Li, Daniel Povey, Sanjeev Khudanpur

This paper proposes a parallel computation strategy and a posterior-based lattice expansion algorithm for efficient lattice rescoring with neural language models (LMs) for automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

13,768

Paper
Code

Wake Word Detection with Streaming Transformers

no code implementations • 8 Feb 2021 • Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur

Modern wake word detection systems usually rely on neural networks for acoustic modeling.

Paper
Add Code

DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs

1 code implementation • 3 Nov 2020 • Desh Raj, Leibny Paola Garcia-Perera, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur

Several advances have been made recently towards handling overlapping speech for speaker diarization.

Audio and Speech Processing Sound

Paper
Code

PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR

1 code implementation • 20 May 2020 • Yiwen Shao, Yiming Wang, Daniel Povey, Sanjeev Khudanpur

We present PyChain, a fully parallelized PyTorch implementation of end-to-end lattice-free maximum mutual information (LF-MMI) training for the so-called \emph{chain models} in the Kaldi automatic speech recognition (ASR) toolkit.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

214

Paper
Code

Wake Word Detection with Alignment-Free Lattice-Free MMI

1 code implementation • 17 May 2020 • Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur

Always-on spoken language interfaces, e. g. personal digital assistants, rely on a wake word to start processing spoken input.

Decoder

13,768

Paper
Code

CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

no code implementations • 20 Apr 2020 • Shinji Watanabe, Michael Mandel, Jon Barker, Emmanuel Vincent, Ashish Arora, Xuankai Chang, Sanjeev Khudanpur, Vimal Manohar, Daniel Povey, Desh Raj, David Snyder, Aswin Shanmugam Subramanian, Jan Trmal, Bar Ben Yair, Christoph Boeddeker, Zhaoheng Ni, Yusuke Fujita, Shota Horiguchi, Naoyuki Kanda, Takuya Yoshioka, Neville Ryant

Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the 6th CHiME Speech Separation and Recognition Challenge (CHiME-6).

speaker-diarization Speaker Diarization +4

Paper
Add Code

Speaker Diarization with Region Proposal Network

1 code implementation • 14 Feb 2020 • Zili Huang, Shinji Watanabe, Yusuke Fujita, Paola Garcia, Yiwen Shao, Daniel Povey, Sanjeev Khudanpur

Speaker diarization is an important pre-processing step for many speech applications, and it aims to solve the "who spoke when" problem.

Region Proposal speaker-diarization +1

Paper
Code

GPU-Accelerated Viterbi Exact Lattice Decoder for Batched Online and Offline Speech Recognition

1 code implementation • 22 Oct 2019 • Hugo Braun, Justin Luitjens, Ryan Leary, Tim Kaldewey, Daniel Povey

We present an optimized weighted finite-state transducer (WFST) decoder capable of online streaming and offline batch processing of audio using Graphics Processing Units (GPUs).

Decoder speech-recognition +1

Paper
Code

Probing the Information Encoded in X-vectors

no code implementations • 13 Sep 2019 • Desh Raj, David Snyder, Daniel Povey, Sanjeev Khudanpur

Deep neural network based speaker embeddings, such as x-vectors, have been shown to perform well in text-independent speaker recognition/verification tasks.

Data Augmentation Sentence +3

Paper
Add Code

Robust Document Representations for Cross-Lingual Information Retrieval in Low-Resource Settings

no code implementations • WS 2019 • Mahsa Yarmohammadi, Xutai Ma, Sorami Hisamoto, Muhammad Rahman, Yiming Wang, Hainan Xu, Daniel Povey, Philipp Koehn, Kevin Duh

Cross-Lingual Information Retrieval Retrieval

Paper
Add Code

End-to-end speech recognition using lattice-free MMI

no code implementations • Interspeech 2018 2018 • Hossein Hadian, Hossein Sameti, Daniel Povey, Sanjeev Khudanpur

We present our work on end-to-end training of acoustic models using the lattice-free maximum mutual information (LF-MMI) objective function in the context of hidden Markov models.

Ranked #1 on Speech Recognition on Switchboard (300hr)

speech-recognition Speech Recognition

Paper
Add Code

Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks

1 code implementation • Interspeech 2018 2018 • Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi, Sanjeev Khudanpur

Time Delay Neural Networks (TDNNs), also known as onedimensional Convolutional Neural Networks (1-d CNNs), are an efficient and well-performing neural network architecture for speech recognition.

speech-recognition Speech Recognition

143

Paper
Code

Neural Network Language Modeling with Letter-based Features and Importance Sampling

no code implementations • ICASSP 2018 • Hainan Xu, Ke Li, Yiming Wang, Jian Wang, Shiyin Kang, Xie Chen, Daniel Povey, Sanjeev Khudanpur

In this paper we describe an extension of the Kaldi software toolkit to support neural-based language modeling, intended for use in automatic speech recognition (ASR) and related tasks.

Ranked #36 on Speech Recognition on LibriSpeech test-other (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A GPU-based WFST Decoder with Exact Lattice Generation

no code implementations • 9 Apr 2018 • Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Daniel Povey, Sanjeev Khudanpur

We describe initial work on an extension of the Kaldi toolkit that supports weighted finite-state transducer (WFST) decoding on Graphics Processing Units (GPUs).

Decoder Scheduling

Paper
Add Code

Acoustic data-driven lexicon learning based on a greedy pronunciation selection framework

no code implementations • 12 Jun 2017 • Xiaohui Zhang, Vimal Manohar, Daniel Povey, Sanjeev Khudanpur

Speech recognition systems for irregularly-spelled languages like English normally require hand-written pronunciations.

speech-recognition Speech Recognition

Paper
Add Code

Purely sequence-trained neural networks for ASR based on lattice-free MMI

no code implementations • INTERSPEECH 2016 2016 • Daniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahrmani, Vimal Manohar, Xingyu Na, Yiming Wang, Sanjeev Khudanpur

Models trained with LFMMI provide a relative word error rate reduction of ∼11. 5%, over those trained with cross-entropy objective function, and ∼8%, over those trained with cross-entropy and sMBR objective functions.

Ranked #4 on Speech Recognition on WSJ eval92

Language Modelling Speech Recognition

Paper
Add Code

MUSAN: A Music, Speech, and Noise Corpus

2 code implementations • 28 Oct 2015 • David Snyder, Guoguo Chen, Daniel Povey

This report introduces a new corpus of music, speech, and noise.

Sound

Paper
Code

A Coarse-Grained Model for Optimal Coupling of ASR and SMT Systems for Speech Translation

no code implementations • EMNLP 2015 • Gaurav Kumar, Graeme Blackwood, Jan Trmal, Daniel Povey, Sanjeev Khudanpur

Language Modelling Machine Translation +2

Paper
Add Code

Parallel training of DNNs with Natural Gradient and Parameter Averaging

1 code implementation • 27 Oct 2014 • Daniel Povey, Xiaohui Zhang, Sanjeev Khudanpur

However, we have another method, an approximate and efficient implementation of Natural Gradient for Stochastic Gradient Descent (NG-SGD), which seems to allow our periodic-averaging method to work well, as well as substantially improving the convergence of SGD on a single machine.

speech-recognition Speech Recognition

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.