Search Results for author: Xiaodong Cui

Found 34 papers, 3 papers with code

Training Nonlinear Transformers for Efficient In-Context Learning: A Theoretical Learning and Generalization Analysis

no code implementations23 Feb 2024 Hongkang Li, Meng Wang, Songtao Lu, Xiaodong Cui, Pin-Yu Chen

Despite the empirical success, the mechanics of how to train a Transformer to achieve ICL and the corresponding ICL capacity is mostly elusive due to the technical challenges of analyzing the nonconvex training problems resulting from the nonlinear self-attention and nonlinear activation in Transformers.

Binary Classification In-Context Learning

Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization

1 code implementation13 Jan 2024 A F M Saif, Xiaodong Cui, Han Shen, Songtao Lu, Brian Kingsbury, Tianyi Chen

In this paper, we present a novel bilevel optimization-based training approach to training acoustic models for automatic speech recognition (ASR) tasks that we term {bi-level joint unsupervised and supervised training (BL-JUST)}.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Soft Random Sampling: A Theoretical and Empirical Analysis

no code implementations21 Nov 2023 Xiaodong Cui, Ashish Mittal, Songtao Lu, Wei zhang, George Saon, Brian Kingsbury

Soft random sampling (SRS) is a simple yet effective approach for efficient training of large-scale deep neural networks when dealing with massive data.

Automatic Speech Recognition speech-recognition +1

How Can Context Help? Exploring Joint Retrieval of Passage and Personalized Context

no code implementations26 Aug 2023 Hui Wan, Hongkang Li, Songtao Lu, Xiaodong Cui, Marina Danilevsky

The integration of external personalized context information into document-grounded conversational systems has significant potential business value, but has not been well-studied.

Passage Retrieval Retrieval

Diagonal State Space Augmented Transformers for Speech Recognition

no code implementations27 Feb 2023 George Saon, Ankit Gupta, Xiaodong Cui

We improve on the popular conformer architecture by replacing the depthwise temporal convolutions with diagonal state space (DSS) models.

speech-recognition Speech Recognition

Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing

no code implementations29 Mar 2022 Xiaodong Cui, George Saon, Tohru Nagano, Masayuki Suzuki, Takashi Fukuda, Brian Kingsbury, Gakuto Kurata

We introduce two techniques, length perturbation and n-best based label smoothing, to improve generalization of deep neural network (DNN) acoustic models for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized Stochastic Gradient Descent

no code implementations2 Dec 2021 Wei zhang, Mingrui Liu, Yu Feng, Xiaodong Cui, Brian Kingsbury, Yuhai Tu

We conduct extensive studies over 18 state-of-the-art DL models/tasks and demonstrate that DPSGD often converges in cases where SSGD diverges for large learning rates in the large batch setting.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Asynchronous Decentralized Distributed Training of Acoustic Models

no code implementations21 Oct 2021 Xiaodong Cui, Wei zhang, Abdullah Kayi, Mingrui Liu, Ulrich Finkler, Brian Kingsbury, George Saon, David Kung

Specifically, we study three variants of asynchronous decentralized parallel SGD (ADPSGD), namely, fixed and randomized communication patterns on a ring as well as a delay-by-one scheme.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

4-bit Quantization of LSTM-based Speech Recognition Models

no code implementations27 Aug 2021 Andrea Fasoli, Chia-Yu Chen, Mauricio Serrano, Xiao Sun, Naigang Wang, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Wei zhang, Zoltán Tüske, Kailash Gopalakrishnan

We investigate the impact of aggressive low-precision representations of weights and activations in two families of large LSTM-based architectures for Automatic Speech Recognition (ASR): hybrid Deep Bidirectional LSTM - Hidden Markov Models (DBLSTM-HMMs) and Recurrent Neural Network - Transducers (RNN-Ts).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Reducing Exposure Bias in Training Recurrent Neural Network Transducers

no code implementations24 Aug 2021 Xiaodong Cui, Brian Kingsbury, George Saon, David Haws, Zoltan Tuske

By reducing the exposure bias, we show that we can further improve the accuracy of a high-performance RNNT ASR model and obtain state-of-the-art results on the 300-hour Switchboard dataset.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

On Sample Based Explanation Methods for NLP: Faithfulness, Efficiency and Semantic Evaluation

no code implementations ACL 2021 Wei zhang, Ziming Huang, Yada Zhu, Guangnan Ye, Xiaodong Cui, Fan Zhang

In the recent advances of natural language processing, the scale of the state-of-the-art models and datasets is usually extensive, which challenges the application of sample-based explanation methods in many aspects, such as explanation interpretability, efficiency, and faithfulness.

On Sample Based Explanation Methods for NLP:Efficiency, Faithfulness, and Semantic Evaluation

no code implementations9 Jun 2021 Wei zhang, Ziming Huang, Yada Zhu, Guangnan Ye, Xiaodong Cui, Fan Zhang

In the recent advances of natural language processing, the scale of the state-of-the-art models and datasets is usually extensive, which challenges the application of sample-based explanation methods in many aspects, such as explanation interpretability, efficiency, and faithfulness.

Federated Acoustic Modeling For Automatic Speech Recognition

no code implementations8 Feb 2021 Xiaodong Cui, Songtao Lu, Brian Kingsbury

In this paper, we investigate federated acoustic modeling using data from multiple clients.

Federated Learning Speech Recognition Sound Distributed, Parallel, and Cluster Computing Audio and Speech Processing

Speech Emotion Recognition with Multiscale Area Attention and Data Augmentation

no code implementations3 Feb 2021 Mingke Xu, Fan Zhang, Xiaodong Cui, Wei zhang

In this paper, we apply multiscale area attention in a deep convolutional neural network to attend emotional characteristics with varied granularities and therefore the classifier can benefit from an ensemble of attentions with different scales.

Data Augmentation Speech Emotion Recognition

Probing quasi-long-range ordering by magnetostriction in monolayer CoPS3

no code implementations4 Jan 2021 Qiye Liu, Le Wang, Ying Fu, Xi Zhang, Lianglong Huang, Huimin Su, Junhao Lin, Xiaobin Chen, Dapeng Yu, Xiaodong Cui, Jia-Wei Mei, Jun-Feng Dai

Mermin-Wagner-Coleman theorem predicts no long-range magnetic order at finite temperature in the two-dimensional (2D) isotropic systems, but a quasi-long-range order with a divergent correlation length at the Kosterlitz-Thouless (KT) transition for planar magnets.

Mesoscale and Nanoscale Physics

Ultra-Low Precision 4-bit Training of Deep Neural Networks

no code implementations NeurIPS 2020 Xiao Sun, Naigang Wang, Chia-Yu Chen, Jiamin Ni, Ankur Agrawal, Xiaodong Cui, Swagath Venkataramani, Kaoutar El Maghraoui, Vijayalakshmi (Viji) Srinivasan, Kailash Gopalakrishnan

In this paper, we propose a number of novel techniques and numerical representation formats that enable, for the very first time, the precision of training systems to be aggressively scaled from 8-bits to 4-bits.

Quantization

Map Generation from Large Scale Incomplete and Inaccurate Data Labels

no code implementations20 May 2020 Rui Zhang, Conrad Albrecht, Wei zhang, Xiaodong Cui, Ulrich Finkler, David Kung, Siyuan Lu

Accurately and globally mapping human infrastructure is an important and challenging task with applications in routing, regulation compliance monitoring, and natural disaster response management etc..

Disaster Response Management

Improving Efficiency in Large-Scale Decentralized Distributed Training

no code implementations4 Feb 2020 Wei Zhang, Xiaodong Cui, Abdullah Kayi, Mingrui Liu, Ulrich Finkler, Brian Kingsbury, George Saon, Youssef Mroueh, Alper Buyuktosunoglu, Payel Das, David Kung, Michael Picheny

Decentralized Parallel SGD (D-PSGD) and its asynchronous variant Asynchronous Parallel SGD (AD-PSGD) is a family of distributed learning algorithms that have been demonstrated to perform well for large-scale deep learning tasks.

speech-recognition Speech Recognition

Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets

no code implementations ICLR 2020 Mingrui Liu, Youssef Mroueh, Jerret Ross, Wei zhang, Xiaodong Cui, Payel Das, Tianbao Yang

Then we propose an adaptive variant of OSG named Optimistic Adagrad (OAdagrad) and reveal an \emph{improved} adaptive complexity $O\left(\epsilon^{-\frac{2}{1-\alpha}}\right)$, where $\alpha$ characterizes the growth rate of the cumulative stochastic gradient and $0\leq \alpha\leq 1/2$.

A Decentralized Parallel Algorithm for Training Generative Adversarial Nets

no code implementations NeurIPS 2020 Mingrui Liu, Wei zhang, Youssef Mroueh, Xiaodong Cui, Jerret Ross, Tianbao Yang, Payel Das

Despite recent progress on decentralized algorithms for training deep neural networks, it remains unclear whether it is possible to train GANs in a decentralized manner.

Task-Based Learning via Task-Oriented Prediction Network with Applications in Finance

no code implementations17 Oct 2019 Di Chen, Yada Zhu, Xiaodong Cui, Carla P. Gomes

Real-world applications often involve domain-specific and task-based performance objectives that are not captured by the standard machine learning losses, but are critical for decision making.

Decision Making

Challenging the Boundaries of Speech Recognition: The MALACH Corpus

no code implementations9 Aug 2019 Michael Picheny, Zóltan Tüske, Brian Kingsbury, Kartik Audhkhasi, Xiaodong Cui, George Saon

This paper proposes that the community place focus on the MALACH corpus to develop speech recognition systems that are more robust with respect to accents, disfluencies and emotional speech.

speech-recognition Speech Recognition

Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition

no code implementations10 Jul 2019 Khoi-Nguyen C. Mac, Xiaodong Cui, Wei zhang, Michael Picheny

In automatic speech recognition (ASR), wideband (WB) and narrowband (NB) speech signals with different sampling rates typically use separate acoustic models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition

no code implementations10 Jul 2019 Wei Zhang, Xiaodong Cui, Ulrich Finkler, George Saon, Abdullah Kayi, Alper Buyuktosunoglu, Brian Kingsbury, David Kung, Michael Picheny

On commonly used public SWB-300 and SWB-2000 ASR datasets, ADPSGD can converge with a batch size 3X as large as the one used in SSGD, thus enable training at a much larger scale.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Distributed Deep Learning Strategies For Automatic Speech Recognition

no code implementations10 Apr 2019 Wei Zhang, Xiaodong Cui, Ulrich Finkler, Brian Kingsbury, George Saon, David Kung, Michael Picheny

We show that we can train the LSTM model using ADPSGD in 14 hours with 16 NVIDIA P100 GPUs to reach a 7. 6% WER on the Hub5- 2000 Switchboard (SWB) test set and a 13. 1% WER on the CallHome (CH) test set.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Embedding-Based Speaker Adaptive Training of Deep Neural Networks

no code implementations17 Oct 2017 Xiaodong Cui, Vaibhava Goel, George Saon

An embedding-based speaker adaptive training (SAT) approach is proposed and investigated in this paper for deep neural network acoustic modeling.

speech-recognition Speech Recognition

Dilated Recurrent Neural Networks

2 code implementations NeurIPS 2017 Shiyu Chang, Yang Zhang, Wei Han, Mo Yu, Xiaoxiao Guo, Wei Tan, Xiaodong Cui, Michael Witbrock, Mark Hasegawa-Johnson, Thomas S. Huang

To provide a theory-based quantification of the architecture's advantages, we introduce a memory capacity measure, the mean recurrent length, which is more suitable for RNNs with long skip connections than existing measures.

Sequential Image Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.