no code implementations • 18 Oct 2023 • Tae Jin Park, He Huang, Ante Jukic, Kunal Dhawan, Krishna C. Puvvada, Nithin Koluguri, Nikolay Karpov, Aleksandr Laptev, Jagadeesh Balam, Boris Ginsburg
We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challenge Distant Automatic Speech Recognition (DASR) Task, focusing on the development of a multi-channel, multi-speaker speech recognition system tailored to transcribe speech from distributed microphones and microphone arrays.
no code implementations • 18 Oct 2023 • Tae Jin Park, He Huang, Coleman Hooper, Nithin Koluguri, Kunal Dhawan, Ante Jukic, Jagadeesh Balam, Boris Ginsburg
This capability offers a tailored training environment for developing neural models suited for speaker diarization and voice activity detection.
no code implementations • 19 Sep 2023 • Krishna C. Puvvada, Nithin Rao Koluguri, Kunal Dhawan, Jagadeesh Balam, Boris Ginsburg
Discrete audio representation, aka audio tokenization, has seen renewed interest driven by its potential to facilitate the application of text language modeling approaches in audio domain.
no code implementations • 11 Sep 2023 • Tae Jin Park, Kunal Dhawan, Nithin Koluguri, Jagadeesh Balam
In addition, these findings point to the potential of using LLMs to improve speaker diarization and other speech processing tasks by capturing semantic and contextual cues.
1 code implementation • 14 Jun 2023 • Kunal Dhawan, Dima Rekesh, Boris Ginsburg
Code-Switching (CS) multilingual Automatic Speech Recognition (ASR) models can transcribe speech containing two or more alternating languages during a conversation.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 30 Sep 2021 • Rahul Sharma, Kunal Dhawan, Balakrishna Pailla
This work presents a novel methodology for calculating the phonetic similarity between words taking motivation from the human perception of sounds.
1 code implementation • 16 Jul 2019 • Kunal Dhawan, Colin Vaz, Ruchir Travadi, Shrikanth Narayanan
We propose an algorithm to extract noise-robust acoustic features from noisy speech.
no code implementations • 15 Jul 2019 • Kunal Dhawan, Ganji Sreeram, Kumar Priyadarshi, Rohit Sinha
End-to-end (E2E) systems are fast replacing the conventional systems in the domain of automatic speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 15 Jul 2019 • Sreeram Ganji, Kunal Dhawan, Kumar Priyadarshi, Rohit Sinha
For the automatic recognition of code-switching speech, the conventional approaches often employ an LID system for detecting the languages present within an utterance.