no code implementations • IWSLT (ACL) 2022 • Oleksii Hrinchuk, Vahid Noroozi, Ashwinkumar Ganesan, Sarah Campbell, Sandeep Subramanian, Somshubra Majumdar, Oleksii Kuchaiev
Our cascade system consists of 1) Conformer RNN-T automatic speech recognition model, 2) punctuation-capitalization model based on pre-trained T5 encoder, 3) ensemble of Transformer neural machine translation models fine-tuned on TED talks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
1 code implementation • 27 Dec 2023 • Vahid Noroozi, Somshubra Majumdar, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg
We also showed that training a model with multiple latencies can achieve better accuracy than single latency models while it enables us to support multiple latencies with a single model.
no code implementations • 18 Sep 2023 • Nithin Rao Koluguri, Samuel Kriman, Georgy Zelenfroind, Somshubra Majumdar, Dima Rekesh, Vahid Noroozi, Jagadeesh Balam, Boris Ginsburg
This paper presents an overview and evaluation of some of the end-to-end ASR models on long-form audios.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 8 May 2023 • Dima Rekesh, Nithin Rao Koluguri, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleksii Hrinchuk, Krishna Puvvada, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg
Conformer-based models have become the dominant end-to-end architecture for speech processing tasks.
Ranked #1 on Speech Recognition on LibriSpeech test-other
1 code implementation • 13 Apr 2023 • Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, Boris Ginsburg
TDT models for Speech Recognition achieve better accuracy and up to 2. 82X faster inference than conventional Transducers.
Intent Classification Intent Classification and Slot Filling +3
1 code implementation • 4 Nov 2022 • Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, Boris Ginsburg
This paper proposes a modification to RNN-Transducer (RNN-T) models for automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 6 Oct 2022 • Somshubra Majumdar, Shantanu Acharya, Vitaly Lavrukhin, Boris Ginsburg
Automatic speech recognition models are often adapted to improve their accuracy in a new domain.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 6 Oct 2021 • Aleksandr Laptev, Somshubra Majumdar, Boris Ginsburg
This paper presents novel Weighted Finite-State Transducer (WFST) topologies to implement Connectionist Temporal Classification (CTC)-like algorithms for automatic speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 22 Jul 2021 • Aleksei Kalinov, Somshubra Majumdar, Jagadeesh Balam, Boris Ginsburg
The basic idea is to introduce a parallel mixture of shallow networks instead of a very deep network.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 5 Apr 2021 • Patrick K. O'Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko
In the English speech-to-text (STT) machine learning task, acoustic models are conventionally trained on uncased Latin characters, and any necessary orthography (such as capitalization, punctuation, and denormalization of non-standard words) is imputed by separate post-processing models.
Ranked #3 on Speech Recognition on SPGISpeech
no code implementations • 5 Apr 2021 • Somshubra Majumdar, Jagadeesh Balam, Oleksii Hrinchuk, Vitaly Lavrukhin, Vahid Noroozi, Boris Ginsburg
We propose Citrinet - a new end-to-end convolutional Connectionist Temporal Classification (CTC) based automatic speech recognition (ASR) model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
2 code implementations • 27 Feb 2019 • Fazle Karim, Somshubra Majumdar, Houshang Darabi
In this paper, we propose utilizing an adversarial transformation network (ATN) on a distilled model to attack various time series classification models.
4 code implementations • 27 Feb 2019 • Fazle Karim, Somshubra Majumdar, Houshang Darabi
In this paper, we perform a series of ablation tests (3627 experiments) on LSTM-FCN and ALSTM-FCN to provide a better understanding of the model and each of its sub-module.
no code implementations • 19 Dec 2018 • Maryam Pishgar, Fazle Karim, Somshubra Majumdar, Houshang Darabi
Vocal disorders have affected several patients all over the world.
no code implementations • 3 Jun 2018 • Somshubra Majumdar, Amlaan Bhoi, Ganesh Jagadeesan
Style transfer aims to transfer arbitrary visual styles to content images.
7 code implementations • 14 Jan 2018 • Fazle Karim, Somshubra Majumdar, Houshang Darabi, Samuel Harford
Over the past decade, multivariate time series classification has received great attention.
Ranked #1 on Time Series Classification on CharacterTrajectories
9 code implementations • 8 Sep 2017 • Fazle Karim, Somshubra Majumdar, Houshang Darabi, Shun Chen
We propose the augmentation of fully convolutional networks with long short term memory recurrent neural network (LSTM RNN) sub-modules for time series classification.
Ranked #2 on Outlier Detection on ECG5000