Search Results for author: Kunal Dhawan

Found 9 papers, 3 papers with code

The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

no code implementations • 18 Oct 2023 • Tae Jin Park, He Huang, Ante Jukic, Kunal Dhawan, Krishna C. Puvvada, Nithin Koluguri, Nikolay Karpov, Aleksandr Laptev, Jagadeesh Balam, Boris Ginsburg

We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challenge Distant Automatic Speech Recognition (DASR) Task, focusing on the development of a multi-channel, multi-speaker speech recognition system tailored to transcribe speech from distributed microphones and microphone arrays.

Automatic Speech Recognition speaker-diarization +3

Paper
Add Code

Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation

no code implementations • 18 Oct 2023 • Tae Jin Park, He Huang, Coleman Hooper, Nithin Koluguri, Kunal Dhawan, Ante Jukic, Jagadeesh Balam, Boris Ginsburg

This capability offers a tailored training environment for developing neural models suited for speaker diarization and voice activity detection.

Action Detection Activity Detection +3

Paper
Add Code

Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition

no code implementations • 19 Sep 2023 • Krishna C. Puvvada, Nithin Rao Koluguri, Kunal Dhawan, Jagadeesh Balam, Boris Ginsburg

Discrete audio representation, aka audio tokenization, has seen renewed interest driven by its potential to facilitate the application of text language modeling approaches in audio domain.

Language Modelling Quantization +4

Paper
Add Code

Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach

no code implementations • 11 Sep 2023 • Tae Jin Park, Kunal Dhawan, Nithin Koluguri, Jagadeesh Balam

In addition, these findings point to the potential of using LLMs to improve speaker diarization and other speech processing tasks by capturing semantic and contextual cues.

speaker-diarization Speaker Diarization

Paper
Add Code

Unified model for code-switching speech recognition and language identification based on a concatenated tokenizer

1 code implementation • 14 Jun 2023 • Kunal Dhawan, Dima Rekesh, Boris Ginsburg

Code-Switching (CS) multilingual Automatic Speech Recognition (ASR) models can transcribe speech containing two or more alternating languages during a conversation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

10,124

Paper
Code

Phonetic Word Embeddings

1 code implementation • 30 Sep 2021 • Rahul Sharma, Kunal Dhawan, Balakrishna Pailla

This work presents a novel methodology for calculating the phonetic similarity between words taking motivation from the human perception of sounds.

Benchmarking Word Embeddings

Paper
Code

Towards Adapting NMF Dictionaries Using Total Variability Modeling for Noise-Robust Acoustic Features

1 code implementation • 16 Jul 2019 • Kunal Dhawan, Colin Vaz, Ruchir Travadi, Shrikanth Narayanan

We propose an algorithm to extract noise-robust acoustic features from noisy speech.

Paper
Code

Investigating Target Set Reduction for End-to-End Speech Recognition of Hindi-English Code-Switching Data

no code implementations • 15 Jul 2019 • Kunal Dhawan, Ganji Sreeram, Kumar Priyadarshi, Rohit Sinha

End-to-end (E2E) systems are fast replacing the conventional systems in the domain of automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Joint Language Identification of Code-Switching Speech using Attention based E2E Network

no code implementations • 15 Jul 2019 • Sreeram Ganji, Kunal Dhawan, Kumar Priyadarshi, Rohit Sinha

For the automatic recognition of code-switching speech, the conventional approaches often employ an LID system for detecting the languages present within an utterance.

Language Identification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.