Search Results for author: Michael Picheny

Found 31 papers, 5 papers with code

Improving Joint Speech-Text Representations Without Alignment

no code implementations • 11 Aug 2023 • Cal Peyser, Zhong Meng, Ke Hu, Rohit Prabhavalkar, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho

The last year has seen astonishing progress in text-prompted image generation premised on the idea of a cross-modal representation space in which the text and image domains are represented jointly.

Speech Recognition

Paper
Add Code

A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale

no code implementations • 19 Apr 2023 • Cal Peyser, Michael Picheny, Kyunghyun Cho, Rohit Prabhavalkar, Ronny Huang, Tara Sainath

Unpaired text and audio injection have emerged as dominant methods for improving ASR performance in the absence of a large labeled corpus.

Paper
Add Code

Dual Learning for Large Vocabulary On-Device ASR

no code implementations • 11 Jan 2023 • Cal Peyser, Ronny Huang, Tara Sainath, Rohit Prabhavalkar, Michael Picheny, Kyunghyun Cho

Dual learning is a paradigm for semi-supervised machine learning that seeks to leverage unsupervised data by solving two opposite tasks at once.

Paper
Add Code

Towards Disentangled Speech Representations

no code implementations • 28 Aug 2022 • Cal Peyser, Ronny Huang Andrew Rosenberg Tara N. Sainath, Michael Picheny, Kyunghyun Cho

In this paper, we construct a representation learning task based on joint modeling of ASR and TTS, and seek to learn a representation of audio that disentangles that part of the speech signal that is relevant to transcription from that part which is not.

Disentanglement

Paper
Add Code

Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions

no code implementations • 18 Nov 2021 • Chunxi Liu, Michael Picheny, Leda Sari, Pooja Chitkara, Alex Xiao, Xiaohui Zhang, Mark Chou, Andres Alvarado, Caner Hazirbas, Yatharth Saraf

This paper presents initial Speech Recognition results on "Casual Conversations" -- a publicly released 846 hour corpus designed to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of metadata, including age, gender, and skin tone.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Cascaded Multilingual Audio-Visual Learning from Videos

1 code implementation • 8 Nov 2021 • Andrew Rouditchenko, Angie Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, James Glass

In this paper, we explore self-supervised audio-visual models that learn from instructional videos.

audio-visual learning Retrieval

Paper
Code

Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings

no code implementations • 7 Oct 2021 • Jialu Li, Vimal Manohar, Pooja Chitkara, Andros Tjandra, Michael Picheny, Frank Zhang, Xiaohui Zhang, Yatharth Saraf

Domain-adversarial training (DAT) and multi-task learning (MTL) are two common approaches for building accent-robust ASR models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

1 code implementation • ICCV 2021 • Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie Boggust, Rameswar Panda, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Michael Picheny, Shih-Fu Chang

Multimodal self-supervised learning is getting more and more attention as it allows not only to train large networks without human supervision but also to search and retrieve data across various modalities.

Ranked #4 on Long Video Retrieval (Background Removed) on YouCook2

Clustering Contrastive Learning +6

Paper
Code

Accented Speech Recognition Inspired by Human Perception

no code implementations • 9 Apr 2021 • Xiangyun Chu, Elizabeth Combs, Amber Wang, Michael Picheny

This paper explores methods that are inspired by human perception to evaluate possible performance improvements for recognition of accented speech, with a specific focus on recognizing speech with a novel accent relative to that of the training data.

Accented Speech Recognition Automatic Speech Recognition +2

Paper
Add Code

Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs

1 code implementation • 7 Apr 2021 • Sujeong Cha, Wangrui Hou, Hyun Jung, My Phung, Michael Picheny, Hong-Kwang Kuo, Samuel Thomas, Edmilson Morais

To address the first challenge, we propose a novel system that can predict intents from flexible types of inputs: speech, ASR transcripts, or both.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Code

Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems

no code implementations • 8 Oct 2020 • Yinghui Huang, Hong-Kwang Kuo, Samuel Thomas, Zvi Kons, Kartik Audhkhasi, Brian Kingsbury, Ron Hoory, Michael Picheny

Assuming we have additional text-to-intent data (without speech) available, we investigated two techniques to improve the S2I system: (1) transfer learning, in which acoustic embeddings for intent classification are tied to fine-tuned BERT text embeddings; and (2) data augmentation, in which the text-to-intent data is converted into speech-to-intent data using a multi-speaker text-to-speech system.

Data Augmentation intent-classification +2

Paper
Add Code

AVLnet: Learning Audio-Visual Language Representations from Instructional Videos

1 code implementation • 16 Jun 2020 • Andrew Rouditchenko, Angie Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James Glass

Further, we propose a tri-modal model that jointly processes raw audio, video, and text captions from videos to learn a multi-modal semantic embedding space useful for text-video retrieval.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Code

Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition

no code implementations • 24 Feb 2020 • Xiaodong Cui, Wei zhang, Ulrich Finkler, George Saon, Michael Picheny, David Kung

The past decade has witnessed great progress in Automatic Speech Recognition (ASR) due to advances in deep learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Improving Efficiency in Large-Scale Decentralized Distributed Training

no code implementations • 4 Feb 2020 • Wei Zhang, Xiaodong Cui, Abdullah Kayi, Mingrui Liu, Ulrich Finkler, Brian Kingsbury, George Saon, Youssef Mroueh, Alper Buyuktosunoglu, Payel Das, David Kung, Michael Picheny

Decentralized Parallel SGD (D-PSGD) and its asynchronous variant Asynchronous Parallel SGD (AD-PSGD) is a family of distributed learning algorithms that have been demonstrated to perform well for large-scale deep learning tasks.

speech-recognition Speech Recognition

Paper
Add Code

Identifying Mood Episodes Using Dialogue Features from Clinical Interviews

no code implementations • 29 Sep 2019 • Zakaria Aldeneh, Mimansa Jaiswal, Michael Picheny, Melvin McInnis, Emily Mower Provost

Bipolar disorder, a severe chronic mental illness characterized by pathological mood swings from depression to mania, requires ongoing symptom severity tracking to both guide and measure treatments that are critical for maintaining long-term health.

Paper
Add Code

Challenging the Boundaries of Speech Recognition: The MALACH Corpus

no code implementations • 9 Aug 2019 • Michael Picheny, Zóltan Tüske, Brian Kingsbury, Kartik Audhkhasi, Xiaodong Cui, George Saon

This paper proposes that the community place focus on the MALACH corpus to develop speech recognition systems that are more robust with respect to accents, disfluencies and emotional speech.

speech-recognition Speech Recognition

Paper
Add Code

Acoustic Model Optimization Based On Evolutionary Stochastic Gradient Descent with Anchors for Automatic Speech Recognition

no code implementations • 10 Jul 2019 • Xiaodong Cui, Michael Picheny

In this paper we investigate a variant of ESGD for optimization of acoustic models for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition

no code implementations • 10 Jul 2019 • Wei Zhang, Xiaodong Cui, Ulrich Finkler, George Saon, Abdullah Kayi, Alper Buyuktosunoglu, Brian Kingsbury, David Kung, Michael Picheny

On commonly used public SWB-300 and SWB-2000 ASR datasets, ADPSGD can converge with a batch size 3X as large as the one used in SSGD, thus enable training at a much larger scale.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition

no code implementations • 10 Jul 2019 • Khoi-Nguyen C. Mac, Xiaodong Cui, Wei zhang, Michael Picheny

In automatic speech recognition (ASR), wideband (WB) and narrowband (NB) speech signals with different sampling rates typically use separate acoustic models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

English Broadcast News Speech Recognition by Humans and Machines

no code implementations • 30 Apr 2019 • Samuel Thomas, Masayuki Suzuki, Yinghui Huang, Gakuto Kurata, Zoltan Tuske, George Saon, Brian Kingsbury, Michael Picheny, Tom Dibert, Alice Kaiser-Schatzlein, Bern Samko

With recent advances in deep learning, considerable attention has been given to achieving automatic speech recognition performance close to human performance on tasks like conversational telephone speech (CTS) recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Distributed Deep Learning Strategies For Automatic Speech Recognition

no code implementations • 10 Apr 2019 • Wei Zhang, Xiaodong Cui, Ulrich Finkler, Brian Kingsbury, George Saon, David Kung, Michael Picheny

We show that we can train the LSTM model using ADPSGD in 14 hours with 16 NVIDIA P100 GPUs to reach a 7. 6% WER on the Hub5- 2000 Switchboard (SWB) test set and a 13. 1% WER on the CallHome (CH) test set.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Acoustically Grounded Word Embeddings for Improved Acoustics-to-Word Speech Recognition

no code implementations • 29 Mar 2019 • Shane Settle, Kartik Audhkhasi, Karen Livescu, Michael Picheny

Direct acoustics-to-word (A2W) systems for end-to-end automatic speech recognition are simpler to train, and more efficient to decode with, than sub-word systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

1 code implementation • NeurIPS 2018 • Xiaodong Cui, Wei zhang, Zoltán Tüske, Michael Picheny

We propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks.

Evolutionary Algorithms Language Modelling +2

Paper
Code

Building competitive direct acoustics-to-word models for English conversational speech recognition

no code implementations • 8 Dec 2017 • Kartik Audhkhasi, Brian Kingsbury, Bhuvana Ramabhadran, George Saon, Michael Picheny

This is because A2W models recognize words from speech without any decoder, pronunciation lexicon, or externally-trained language model, making training and decoding with such models simple.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Direct Acoustics-to-Word Models for English Conversational Speech Recognition

no code implementations • 22 Mar 2017 • Kartik Audhkhasi, Bhuvana Ramabhadran, George Saon, Michael Picheny, David Nahamoo

Our CTC word model achieves a word error rate of 13. 0%/18. 8% on the Hub5-2000 Switchboard/CallHome test sets without any LM or decoder compared with 9. 6%/16. 0% for phone-based CTC with a 4-gram LM.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

English Conversational Telephone Speech Recognition by Humans and Machines

no code implementations • 6 Mar 2017 • George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall

This then raises two issues - what IS human performance, and how far down can we still drive speech recognition error rates?

Ranked #3 on Speech Recognition on Switchboard + Hub500

Language Modelling Multi-Task Learning +2

Paper
Add Code

Kernel Approximation Methods for Speech Recognition

no code implementations • 13 Jan 2017 • Avner May, Alireza Bagheri Garakani, Zhiyun Lu, Dong Guo, Kuan Liu, Aurélien Bellet, Linxi Fan, Michael Collins, Daniel Hsu, Brian Kingsbury, Michael Picheny, Fei Sha

First, in order to reduce the number of random features required by kernel models, we propose a simple but effective method for feature selection.

feature selection speech-recognition +1

Paper
Add Code

Training variance and performance evaluation of neural networks in speech

no code implementations • 14 Jun 2016 • Ewout van den Berg, Bhuvana Ramabhadran, Michael Picheny

In this work we study variance in the results of neural network training on a wide variety of configurations in automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

A Comparison between Deep Neural Nets and Kernel Acoustic Models for Speech Recognition

no code implementations • 18 Mar 2016 • Zhiyun Lu, Dong Guo, Alireza Bagheri Garakani, Kuan Liu, Avner May, Aurelien Bellet, Linxi Fan, Michael Collins, Brian Kingsbury, Michael Picheny, Fei Sha

We study large-scale kernel methods for acoustic modeling and compare to DNNs on performance metrics related to both acoustic modeling and recognition.

General Classification Model Selection +2

Paper
Add Code

The IBM 2015 English Conversational Telephone Speech Recognition System

no code implementations • 21 May 2015 • George Saon, Hong-Kwang J. Kuo, Steven Rennie, Michael Picheny

We describe the latest improvements to the IBM English conversational telephone speech recognition system.

Ranked #11 on Speech Recognition on Switchboard + Hub500

Language Modelling speech-recognition +1

Paper
Add Code

How to Scale Up Kernel Methods to Be As Good As Deep Neural Nets

no code implementations • 14 Nov 2014 • Zhiyun Lu, Avner May, Kuan Liu, Alireza Bagheri Garakani, Dong Guo, Aurélien Bellet, Linxi Fan, Michael Collins, Brian Kingsbury, Michael Picheny, Fei Sha

The computational complexity of kernel methods has often been a major barrier for applying them to large-scale learning problems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.