Search Results for author: Bhuvana Ramabhadran

Found 39 papers, 1 papers with code

Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data

no code implementations29 Feb 2024 Takaaki Saeki, Gary Wang, Nobuyuki Morioka, Isaac Elias, Kyle Kastner, Andrew Rosenberg, Bhuvana Ramabhadran, Heiga Zen, Françoise Beaufays, Hadar Shemtov

Without any transcribed speech in a new language, this TTS model can generate intelligible speech in >30 unseen languages (CER difference of <10% to ground truth).

Representation Learning Speech Synthesis

O-1: Self-training with Oracle and 1-best Hypothesis

no code implementations14 Aug 2023 Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Kartik Audhkhasi

O-1 achieves 13\% to 25\% relative improvement over EMBR on the various datasets that SpeechStew comprises of, and a 12\% relative gap reduction with respect to the oracle WER over EMBR training on the in-house dataset.

speech-recognition Speech Recognition

Large-scale Language Model Rescoring on Long-form Data

no code implementations13 Jun 2023 Tongzhou Chen, Cyril Allauzen, Yinghui Huang, Daniel Park, David Rybach, W. Ronny Huang, Rodrigo Cabrera, Kartik Audhkhasi, Bhuvana Ramabhadran, Pedro J. Moreno, Michael Riley

In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recognition (ASR) of YouTube videos, which we use as a source for long-form ASR.

Language Modelling speech-recognition +1

Understanding Shared Speech-Text Representations

no code implementations27 Apr 2023 Gary Wang, Kyle Kastner, Ankur Bapna, Zhehuai Chen, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang

Recently, a number of approaches to train speech models by incorpo-rating text into end-to-end models have been developed, with Mae-stro advancing state-of-the-art automatic speech recognition (ASR)and Speech Translation (ST) performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition

no code implementations16 Feb 2023 Zhong Meng, Weiran Wang, Rohit Prabhavalkar, Tara N. Sainath, Tongzhou Chen, Ehsan Variani, Yu Zhang, Bo Li, Andrew Rosenberg, Bhuvana Ramabhadran

We propose JEIT, a joint end-to-end (E2E) model and internal language model (ILM) training method to inject large-scale unpaired text into ILM during E2E training which improves rare-word speech recognition.

Language Modelling speech-recognition +1

Modular Hybrid Autoregressive Transducer

no code implementations31 Oct 2022 Zhong Meng, Tongzhou Chen, Rohit Prabhavalkar, Yu Zhang, Gary Wang, Kartik Audhkhasi, Jesse Emond, Trevor Strohman, Bhuvana Ramabhadran, W. Ronny Huang, Ehsan Variani, Yinghui Huang, Pedro J. Moreno

In this work, we propose a modular hybrid autoregressive transducer (MHAT) that has structurally separated label and blank decoders to predict label and blank distributions, respectively, along with a shared acoustic encoder.

Language Modelling speech-recognition +1

Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR

no code implementations18 Oct 2022 Zhehuai Chen, Ankur Bapna, Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Pedro Moreno, Nanxin Chen

First, we show that by combining speech representations with byte-level text representations and use of language embeddings, we can dramatically reduce the Character Error Rate (CER) on languages with no supervised speech from 64. 8\% to 30. 8\%, a relative reduction of 53\%.

Representation Learning speech-recognition +2

Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition

no code implementations13 Sep 2022 Kartik Audhkhasi, Yinghui Huang, Bhuvana Ramabhadran, Pedro J. Moreno

We investigate a few approaches to increasing attention head diversity, including using different attention mechanisms for each head and auxiliary training loss functions to promote head diversity.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data

no code implementations16 May 2022 Alëna Aksënova, Zhehuai Chen, Chung-Cheng Chiu, Daan van Esch, Pavel Golik, Wei Han, Levi King, Bhuvana Ramabhadran, Andrew Rosenberg, Suzan Schwartz, Gary Wang

However, there are not enough data sets for accented speech, and for the ones that are already available, more training approaches need to be explored to improve the quality of accented speech recognition.

Accented Speech Recognition Benchmarking +1

Improving Rare Word Recognition with LM-aware MWER Training

no code implementations15 Apr 2022 Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prabhavalkar, Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Cal Peyser, Trevor Strohman, Yanzhang He, David Rybach

Language models (LMs) significantly improve the recognition accuracy of end-to-end (E2E) models on words rarely seen during training, when used in either the shallow fusion or the rescoring setups.

MAESTRO: Matched Speech Text Representations through Modality Matching

no code implementations7 Apr 2022 Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro Moreno, Ankur Bapna, Heiga Zen

Self-supervised learning from speech signals aims to learn the latent structure inherent in the signal, while self-supervised learning from text attempts to capture lexical information.

Language Modelling Self-Supervised Learning +3

Ask2Mask: Guided Data Selection for Masked Speech Modeling

no code implementations24 Feb 2022 Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang, Pedro Moreno

They treat all unsupervised speech samples with equal weight, which hinders learning as not all samples have relevant information to learn meaningful representations.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Injecting Text in Self-Supervised Speech Pretraining

no code implementations27 Aug 2021 Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Gary Wang, Pedro Moreno

The proposed method, tts4pretrain complements the power of contrastive learning in self-supervision with linguistic/lexical representations derived from synthesized speech, effectively learning from untranscribed speech and unspoken text.

Contrastive Learning Language Modelling +2

Language-agnostic Multilingual Modeling

no code implementations20 Apr 2020 Arindrima Datta, Bhuvana Ramabhadran, Jesse Emond, Anjuli Kannan, Brian Roark

Multilingual Automated Speech Recognition (ASR) systems allow for the joint training of data-rich and data-scarce languages in a single model.

speech-recognition Speech Recognition +1

Speech Recognition with Augmented Synthesized Speech

no code implementations25 Sep 2019 Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Ye Jia, Pedro Moreno, Yonghui Wu, Zelin Wu

Recent success of the Tacotron speech synthesis architecture and its variants in producing natural sounding multi-speaker synthesized speech has raised the exciting possibility of replacing expensive, manually transcribed, domain-specific, human speech that is used to train speech recognizers.

Data Augmentation Robust Speech Recognition +2

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

4 code implementations9 Jul 2019 Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, RJ Skerry-Ryan, Ye Jia, Andrew Rosenberg, Bhuvana Ramabhadran

We present a multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high quality speech in multiple languages.

Speech Synthesis Voice Cloning

Building competitive direct acoustics-to-word models for English conversational speech recognition

no code implementations8 Dec 2017 Kartik Audhkhasi, Brian Kingsbury, Bhuvana Ramabhadran, George Saon, Michael Picheny

This is because A2W models recognize words from speech without any decoder, pronunciation lexicon, or externally-trained language model, making training and decoding with such models simple.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Language Modeling with Highway LSTM

no code implementations19 Sep 2017 Gakuto Kurata, Bhuvana Ramabhadran, George Saon, Abhinav Sethy

Language models (LMs) based on Long Short Term Memory (LSTM) have shown good gains in many automatic speech recognition tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Direct Acoustics-to-Word Models for English Conversational Speech Recognition

no code implementations22 Mar 2017 Kartik Audhkhasi, Bhuvana Ramabhadran, George Saon, Michael Picheny, David Nahamoo

Our CTC word model achieves a word error rate of 13. 0%/18. 8% on the Hub5-2000 Switchboard/CallHome test sets without any LM or decoder compared with 9. 6%/16. 0% for phone-based CTC with a 4-gram LM.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

End-to-End ASR-free Keyword Search from Speech

no code implementations13 Jan 2017 Kartik Audhkhasi, Andrew Rosenberg, Abhinav Sethy, Bhuvana Ramabhadran, Brian Kingsbury

The first sub-system is a recurrent neural network (RNN)-based acoustic auto-encoder trained to reconstruct the audio through a finite-dimensional representation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Invariant Representations for Noisy Speech Recognition

no code implementations27 Nov 2016 Dmitriy Serdyuk, Kartik Audhkhasi, Philémon Brakel, Bhuvana Ramabhadran, Samuel Thomas, Yoshua Bengio

Ensuring such robustness to variability is a challenge in modern day neural network-based ASR systems, especially when all types of variability are not seen during training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Training variance and performance evaluation of neural networks in speech

no code implementations14 Jun 2016 Ewout van den Berg, Bhuvana Ramabhadran, Michael Picheny

In this work we study variance in the results of neural network training on a wide variety of configurations in automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Diverse Embedding Neural Network Language Models

no code implementations22 Dec 2014 Kartik Audhkhasi, Abhinav Sethy, Bhuvana Ramabhadran

We propose Diverse Embedding Neural Network (DENN), a novel architecture for language models (LMs).

Language Modelling

Generalized Ambiguity Decomposition for Understanding Ensemble Diversity

no code implementations28 Dec 2013 Kartik Audhkhasi, Abhinav Sethy, Bhuvana Ramabhadran, Shrikanth. S. Narayanan

We present extensions of this decomposition to common regression and classification loss functions, and report a simulation-based analysis of the diversity term and the accuracy of the decomposition.

General Classification regression

Accelerating Hessian-free optimization for deep neural networks by implicit preconditioning and sampling

no code implementations5 Sep 2013 Tara N. Sainath, Lior Horesh, Brian Kingsbury, Aleksandr Y. Aravkin, Bhuvana Ramabhadran

This study aims at speeding up Hessian-free training, both by means of decreasing the amount of data used for training, as well as through reduction of the number of Krylov subspace solver iterations used for implicit estimation of the Hessian.

Improvements to deep convolutional neural networks for LVCSR

no code implementations5 Sep 2013 Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, George E. Dahl, George Saon, Hagen Soltau, Tomas Beran, Aleksandr Y. Aravkin, Bhuvana Ramabhadran

We find that with these improvements, particularly with fMLLR and dropout, we are able to achieve an additional 2-3% relative improvement in WER on a 50-hour Broadcast News task over our previous best CNN baseline.

Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.