Search Results for author: Kartik Audhkhasi

Found 21 papers, 1 papers with code

O-1: Self-training with Oracle and 1-best Hypothesis

no code implementations • 14 Aug 2023 • Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Kartik Audhkhasi

O-1 achieves 13\% to 25\% relative improvement over EMBR on the various datasets that SpeechStew comprises of, and a 12\% relative gap reduction with respect to the oracle WER over EMBR training on the in-house dataset.

speech-recognition Speech Recognition

Paper
Add Code

Large-scale Language Model Rescoring on Long-form Data

no code implementations • 13 Jun 2023 • Tongzhou Chen, Cyril Allauzen, Yinghui Huang, Daniel Park, David Rybach, W. Ronny Huang, Rodrigo Cabrera, Kartik Audhkhasi, Bhuvana Ramabhadran, Pedro J. Moreno, Michael Riley

In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recognition (ASR) of YouTube videos, which we use as a source for long-form ASR.

Language Modelling speech-recognition +1

Paper
Add Code

Robust Knowledge Distillation from RNN-T Models With Noisy Training Labels Using Full-Sum Loss

no code implementations • 10 Mar 2023 • Mohammad Zeineldeen, Kartik Audhkhasi, Murali Karthick Baskar, Bhuvana Ramabhadran

Soft distillation is another popular KD method that distills the output logits of the teacher model.

Knowledge Distillation

Paper
Add Code

Modular Hybrid Autoregressive Transducer

no code implementations • 31 Oct 2022 • Zhong Meng, Tongzhou Chen, Rohit Prabhavalkar, Yu Zhang, Gary Wang, Kartik Audhkhasi, Jesse Emond, Trevor Strohman, Bhuvana Ramabhadran, W. Ronny Huang, Ehsan Variani, Yinghui Huang, Pedro J. Moreno

In this work, we propose a modular hybrid autoregressive transducer (MHAT) that has structurally separated label and blank decoders to predict label and blank distributions, respectively, along with a shared acoustic encoder.

Language Modelling speech-recognition +1

Paper
Add Code

Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition

no code implementations • 13 Sep 2022 • Kartik Audhkhasi, Yinghui Huang, Bhuvana Ramabhadran, Pedro J. Moreno

We investigate a few approaches to increasing attention head diversity, including using different attention mechanisms for each head and auxiliary training loss functions to promote head diversity.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems

no code implementations • 8 Oct 2020 • Yinghui Huang, Hong-Kwang Kuo, Samuel Thomas, Zvi Kons, Kartik Audhkhasi, Brian Kingsbury, Ron Hoory, Michael Picheny

Assuming we have additional text-to-intent data (without speech) available, we investigated two techniques to improve the S2I system: (1) transfer learning, in which acoustic embeddings for intent classification are tied to fine-tuned BERT text embeddings; and (2) data augmentation, in which the text-to-intent data is converted into speech-to-intent data using a multi-speaker text-to-speech system.

Data Augmentation intent-classification +2

Paper
Add Code

End-to-End Spoken Language Understanding Without Full Transcripts

no code implementations • 30 Sep 2020 • Hong-Kwang J. Kuo, Zoltán Tüske, Samuel Thomas, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, Luis Lastras

For our speech-to-entities experiments on the ATIS corpus, both the CTC and attention models showed impressive ability to skip non-entity words: there was little degradation when trained on just entities versus full transcripts.

slot-filling Slot Filling +3

Paper
Add Code

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos

1 code implementation • 16 Jun 2020 • Andrew Rouditchenko, Angie Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James Glass

Further, we propose a tri-modal model that jointly processes raw audio, video, and text captions from videos to learn a multi-modal semantic embedding space useful for text-video retrieval.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Code

Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard

no code implementations • 20 Jan 2020 • Zoltán Tüske, George Saon, Kartik Audhkhasi, Brian Kingsbury

It is generally believed that direct sequence-to-sequence (seq2seq) speech recognition models are competitive with hybrid models only when a large amount of data, at least a thousand hours, is available for training.

Ranked #2 on Speech Recognition on swb_hub_500 WER fullSWBCH

Data Augmentation Language Modelling +2

Paper
Add Code

Challenging the Boundaries of Speech Recognition: The MALACH Corpus

no code implementations • 9 Aug 2019 • Michael Picheny, Zóltan Tüske, Brian Kingsbury, Kartik Audhkhasi, Xiaodong Cui, George Saon

This paper proposes that the community place focus on the MALACH corpus to develop speech recognition systems that are more robust with respect to accents, disfluencies and emotional speech.

speech-recognition Speech Recognition

Paper
Add Code

Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation

no code implementations • 17 Apr 2019 • Gakuto Kurata, Kartik Audhkhasi

Conventional automatic speech recognition (ASR) systems trained from frame-level alignments can easily leverage posterior fusion to improve ASR accuracy and build a better single model with knowledge distillation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Acoustically Grounded Word Embeddings for Improved Acoustics-to-Word Speech Recognition

no code implementations • 29 Mar 2019 • Shane Settle, Kartik Audhkhasi, Karen Livescu, Michael Picheny

Direct acoustics-to-word (A2W) systems for end-to-end automatic speech recognition are simpler to train, and more efficient to decode with, than sub-word systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition

no code implementations • 7 Feb 2018 • Xuesong Yang, Kartik Audhkhasi, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Mark Hasegawa-Johnson

The performance of automatic speech recognition systems degrades with increasing mismatch between the training and testing scenarios.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Building competitive direct acoustics-to-word models for English conversational speech recognition

no code implementations • 8 Dec 2017 • Kartik Audhkhasi, Brian Kingsbury, Bhuvana Ramabhadran, George Saon, Michael Picheny

This is because A2W models recognize words from speech without any decoder, pronunciation lexicon, or externally-trained language model, making training and decoding with such models simple.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Direct Acoustics-to-Word Models for English Conversational Speech Recognition

no code implementations • 22 Mar 2017 • Kartik Audhkhasi, Bhuvana Ramabhadran, George Saon, Michael Picheny, David Nahamoo

Our CTC word model achieves a word error rate of 13. 0%/18. 8% on the Hub5-2000 Switchboard/CallHome test sets without any LM or decoder compared with 9. 6%/16. 0% for phone-based CTC with a 4-gram LM.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

English Conversational Telephone Speech Recognition by Humans and Machines

no code implementations • 6 Mar 2017 • George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall

This then raises two issues - what IS human performance, and how far down can we still drive speech recognition error rates?

Ranked #3 on Speech Recognition on Switchboard + Hub500

Language Modelling Multi-Task Learning +2

Paper
Add Code

End-to-End ASR-free Keyword Search from Speech

no code implementations • 13 Jan 2017 • Kartik Audhkhasi, Andrew Rosenberg, Abhinav Sethy, Bhuvana Ramabhadran, Brian Kingsbury

The first sub-system is a recurrent neural network (RNN)-based acoustic auto-encoder trained to reconstruct the audio through a finite-dimensional representation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Invariant Representations for Noisy Speech Recognition

no code implementations • 27 Nov 2016 • Dmitriy Serdyuk, Kartik Audhkhasi, Philémon Brakel, Bhuvana Ramabhadran, Samuel Thomas, Yoshua Bengio

Ensuring such robustness to variability is a challenge in modern day neural network-based ASR systems, especially when all types of variability are not seen during training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Diverse Embedding Neural Network Language Models

no code implementations • 22 Dec 2014 • Kartik Audhkhasi, Abhinav Sethy, Bhuvana Ramabhadran

We propose Diverse Embedding Neural Network (DENN), a novel architecture for language models (LMs).

Language Modelling

Paper
Add Code

Generalized Ambiguity Decomposition for Understanding Ensemble Diversity

no code implementations • 28 Dec 2013 • Kartik Audhkhasi, Abhinav Sethy, Bhuvana Ramabhadran, Shrikanth. S. Narayanan

We present extensions of this decomposition to common regression and classification loss functions, and report a simulation-based analysis of the diversity term and the accuracy of the decomposition.

General Classification regression

Paper
Add Code

Which ASR should I choose for my dialogue system?

no code implementations • WS 2013 • Fabrizio Morbini, Kartik Audhkhasi, Kenji Sagae, Ron artstein, Do{\u{g}}an Can, Panayiotis Georgiou, Shri Narayanan, Anton Leuski, David Traum

Speech Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.