Search Results for author: Somshubra Majumdar

Found 17 papers, 8 papers with code

NVIDIA NeMo Offline Speech Translation Systems for IWSLT 2022

no code implementations • IWSLT (ACL) 2022 • Oleksii Hrinchuk, Vahid Noroozi, Ashwinkumar Ganesan, Sarah Campbell, Sandeep Subramanian, Somshubra Majumdar, Oleksii Kuchaiev

Our cascade system consists of 1) Conformer RNN-T automatic speech recognition model, 2) punctuation-capitalization model based on pre-trained T5 encoder, 3) ensemble of Transformer neural machine translation models fine-tuned on TED talks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition

1 code implementation • 27 Dec 2023 • Vahid Noroozi, Somshubra Majumdar, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg

We also showed that training a model with multiple latencies can achieve better accuracy than single latency models while it enables us to support multiple latencies with a single model.

Automatic Speech Recognition Decoder +2

10,237

Paper
Code

Investigating End-to-End ASR Architectures for Long Form Audio Transcription

no code implementations • 18 Sep 2023 • Nithin Rao Koluguri, Samuel Kriman, Georgy Zelenfroind, Somshubra Majumdar, Dima Rekesh, Vahid Noroozi, Jagadeesh Balam, Boris Ginsburg

This paper presents an overview and evaluation of some of the end-to-end ASR models on long-form audios.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

no code implementations • 8 May 2023 • Dima Rekesh, Nithin Rao Koluguri, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleksii Hrinchuk, Krishna Puvvada, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg

Conformer-based models have become the dominant end-to-end architecture for speech processing tasks.

Ranked #1 on Speech Recognition on LibriSpeech test-other

Automatic Speech Recognition Decoder +4

Paper
Add Code

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations

1 code implementation • 13 Apr 2023 • Hainan Xu, Fei Jia, Somshubra Majumdar, He Huang, Shinji Watanabe, Boris Ginsburg

TDT models for Speech Recognition achieve better accuracy and up to 2. 82X faster inference than conventional Transducers.

Ranked #1 on Speech Recognition on facebook/multilingual_librispeech german

Intent Classification Intent Classification and Slot Filling +3

10,237

Paper
Code

Multi-blank Transducers for Speech Recognition

1 code implementation • 4 Nov 2022 • Hainan Xu, Fei Jia, Somshubra Majumdar, Shinji Watanabe, Boris Ginsburg

This paper proposes a modification to RNN-Transducer (RNN-T) models for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

10,237

Paper
Code

Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition

no code implementations • 6 Oct 2022 • Somshubra Majumdar, Shantanu Acharya, Vitaly Lavrukhin, Boris Ginsburg

Automatic speech recognition models are often adapted to improve their accuracy in a new domain.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

CTC Variations Through New WFST Topologies

no code implementations • 6 Oct 2021 • Aleksandr Laptev, Somshubra Majumdar, Boris Ginsburg

This paper presents novel Weighted Finite-State Transducer (WFST) topologies to implement Connectionist Temporal Classification (CTC)-like algorithms for automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

CarneliNet: Neural Mixture Model for Automatic Speech Recognition

no code implementations • 22 Jul 2021 • Aleksei Kalinov, Somshubra Majumdar, Jagadeesh Balam, Boris Ginsburg

The basic idea is to introduce a parallel mixture of shallow networks instead of a very deep network.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition

1 code implementation • 5 Apr 2021 • Patrick K. O'Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko

In the English speech-to-text (STT) machine learning task, acoustic models are conventionally trained on uncased Latin characters, and any necessary orthography (such as capitalization, punctuation, and denormalization of non-standard words) is imputed by separate post-processing models.

Ranked #3 on Speech Recognition on SPGISpeech

speech-recognition Speech Recognition

7,952

Paper
Code

Citrinet: Closing the Gap between Non-Autoregressive and Autoregressive End-to-End Models for Automatic Speech Recognition

no code implementations • 5 Apr 2021 • Somshubra Majumdar, Jagadeesh Balam, Oleksii Hrinchuk, Vitaly Lavrukhin, Vahid Noroozi, Boris Ginsburg

We propose Citrinet - a new end-to-end convolutional Connectionist Temporal Classification (CTC) based automatic speech recognition (ASR) model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Adversarial Attacks on Time Series

2 code implementations • 27 Feb 2019 • Fazle Karim, Somshubra Majumdar, Houshang Darabi

In this paper, we propose utilizing an adversarial transformation network (ATN) on a distilled model to attack various time series classification models.

Classification Dynamic Time Warping +4

Paper
Code

Insights into LSTM Fully Convolutional Networks for Time Series Classification

4 code implementations • 27 Feb 2019 • Fazle Karim, Somshubra Majumdar, Houshang Darabi

In this paper, we perform a series of ablation tests (3627 experiments) on LSTM-FCN and ALSTM-FCN to provide a better understanding of the model and each of its sub-module.

Classification General Classification +3

726

Paper
Code

Pathological Voice Classification Using Mel-Cepstrum Vectors and Support Vector Machine

no code implementations • 19 Dec 2018 • Maryam Pishgar, Fazle Karim, Somshubra Majumdar, Houshang Darabi

Vocal disorders have affected several patients all over the world.

General Classification

Paper
Add Code

A Comprehensive Comparison between Neural Style Transfer and Universal Style Transfer

no code implementations • 3 Jun 2018 • Somshubra Majumdar, Amlaan Bhoi, Ganesh Jagadeesan

Style transfer aims to transfer arbitrary visual styles to content images.

Style Transfer

Paper
Add Code

Multivariate LSTM-FCNs for Time Series Classification

7 code implementations • 14 Jan 2018 • Fazle Karim, Somshubra Majumdar, Houshang Darabi, Samuel Harford

Over the past decade, multivariate time series classification has received great attention.

Ranked #1 on Time Series Classification on CharacterTrajectories

Action Recognition General Classification +4

4,766

Paper
Code

LSTM Fully Convolutional Networks for Time Series Classification

9 code implementations • 8 Sep 2017 • Fazle Karim, Somshubra Majumdar, Houshang Darabi, Shun Chen

We propose the augmentation of fully convolutional networks with long short term memory recurrent neural network (LSTM RNN) sub-modules for time series classification.

Ranked #2 on Outlier Detection on ECG5000

General Classification Outlier Detection +3

4,765

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.