no code implementations • 4 Oct 2023 • Aleksandr Meister, Matvei Novikov, Nikolay Karpov, Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg
Traditional automatic speech recognition (ASR) models output lower-cased words without punctuation marks, which reduces readability and necessitates a subsequent text processing model to convert ASR transcripts into a proper format.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 23 Sep 2023 • Yang Zhang, Travis M. Bartley, Mariana Graterol-Fuenmayor, Vitaly Lavrukhin, Evelina Bakhturina, Boris Ginsburg
Through this new framework, we can identify strengths and weaknesses of GPT-based TN, opening opportunities for future work.
2 code implementations • 9 Aug 2023 • Yang Zhang, Krishna C. Puvvada, Vitaly Lavrukhin, Boris Ginsburg
We propose CONF-TSASR, a non-autoregressive end-to-end time-frequency domain architecture for single-channel target-speaker automatic speech recognition (TS-ASR).
no code implementations • 27 Jun 2023 • Igor Gitman, Vitaly Lavrukhin, Aleksandr Laptev, Boris Ginsburg
Second, we demonstrate that it is possible to combine base and adapted models to achieve strong results on both original and target data.
no code implementations • 27 Feb 2023 • Vladimir Bataev, Roman Korostik, Evgeny Shabalin, Vitaly Lavrukhin, Boris Ginsburg
We propose an end-to-end Automatic Speech Recognition (ASR) system that can be trained on transcribed speech data, text-only data, or a mixture of both.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 6 Oct 2022 • Somshubra Majumdar, Shantanu Acharya, Vitaly Lavrukhin, Boris Ginsburg
Automatic speech recognition models are often adapted to improve their accuracy in a new domain.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 11 Apr 2021 • Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg
Automatic Speech Recognition and Text-to-Speech systems are primarily trained in a supervised fashion and require high-quality, accurately labeled speech datasets.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 5 Apr 2021 • Patrick K. O'Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko
In the English speech-to-text (STT) machine learning task, acoustic models are conventionally trained on uncased Latin characters, and any necessary orthography (such as capitalization, punctuation, and denormalization of non-standard words) is imputed by separate post-processing models.
Ranked #3 on Speech Recognition on SPGISpeech
no code implementations • 5 Apr 2021 • Somshubra Majumdar, Jagadeesh Balam, Oleksii Hrinchuk, Vitaly Lavrukhin, Vahid Noroozi, Boris Ginsburg
We propose Citrinet - a new end-to-end convolutional Connectionist Temporal Classification (CTC) based automatic speech recognition (ASR) model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 3 Apr 2021 • Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg, Yang Zhang
This paper introduces a new multi-speaker English dataset for training text-to-speech models.
no code implementations • ICLR 2020 • Boris Ginsburg, Patrice Castonguay, Oleksii Hrinchuk, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Huyen Nguyen, Yang Zhang, Jonathan M. Cohen
We propose NovoGrad, an adaptive stochastic gradient descent method with layer-wise gradient normalization and decoupled weight decay.
15 code implementations • 22 Oct 2019 • Samuel Kriman, Stanislav Beliaev, Boris Ginsburg, Jocelyn Huang, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Yang Zhang
We propose a new end-to-end neural acoustic model for automatic speech recognition.
Ranked #33 on Speech Recognition on LibriSpeech test-clean
Speech Recognition Audio and Speech Processing
1 code implementation • 14 Sep 2019 • Oleksii Kuchaiev, Jason Li, Huyen Nguyen, Oleksii Hrinchuk, Ryan Leary, Boris Ginsburg, Samuel Kriman, Stanislav Beliaev, Vitaly Lavrukhin, Jack Cook, Patrice Castonguay, Mariya Popova, Jocelyn Huang, Jonathan M. Cohen
NeMo (Neural Modules) is a Python framework-agnostic toolkit for creating AI applications through re-usability, abstraction, and composition.
Ranked #1 on Speech Recognition on Common Voice Spanish (using extra training data)
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
3 code implementations • 27 May 2019 • Boris Ginsburg, Patrice Castonguay, Oleksii Hrinchuk, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Huyen Nguyen, Yang Zhang, Jonathan M. Cohen
We propose NovoGrad, an adaptive stochastic gradient descent method with layer-wise gradient normalization and decoupled weight decay.
10 code implementations • 5 Apr 2019 • Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, Ravi Teja Gadde
In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data.
Ranked #3 on Speech Recognition on Hub5'00 SwitchBoard
no code implementations • 2 Nov 2018 • Jason Li, Ravi Gadde, Boris Ginsburg, Vitaly Lavrukhin
Building an accurate automatic speech recognition (ASR) system requires a large dataset that contains many hours of labeled speech samples produced by a diverse set of speakers.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • WS 2018 • Oleksii Kuchaiev, Boris Ginsburg, Igor Gitman, Vitaly Lavrukhin, Carl Case, Paulius Micikevicius
We present OpenSeq2Seq {--} an open-source toolkit for training sequence-to-sequence models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
3 code implementations • 25 May 2018 • Oleksii Kuchaiev, Boris Ginsburg, Igor Gitman, Vitaly Lavrukhin, Jason Li, Huyen Nguyen, Carl Case, Paulius Micikevicius
We present OpenSeq2Seq - a TensorFlow-based toolkit for training sequence-to-sequence models that features distributed and mixed-precision training.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4