no code implementations • 18 Oct 2023 • Tae Jin Park, He Huang, Ante Jukic, Kunal Dhawan, Krishna C. Puvvada, Nithin Koluguri, Nikolay Karpov, Aleksandr Laptev, Jagadeesh Balam, Boris Ginsburg
We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challenge Distant Automatic Speech Recognition (DASR) Task, focusing on the development of a multi-channel, multi-speaker speech recognition system tailored to transcribe speech from distributed microphones and microphone arrays.
no code implementations • 18 Oct 2023 • Tae Jin Park, He Huang, Coleman Hooper, Nithin Koluguri, Kunal Dhawan, Ante Jukic, Jagadeesh Balam, Boris Ginsburg
This capability offers a tailored training environment for developing neural models suited for speaker diarization and voice activity detection.
no code implementations • 11 Sep 2023 • Tae Jin Park, Kunal Dhawan, Nithin Koluguri, Jagadeesh Balam
In addition, these findings point to the potential of using LLMs to improve speaker diarization and other speech processing tasks by capturing semantic and contextual cues.
no code implementations • 30 Mar 2022 • Tae Jin Park, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg
First, we use multi-scale clustering as an initialization to estimate the number of speakers and obtain the average speaker representation vector for each speaker and each scale.
no code implementations • 19 Oct 2021 • Tae Jin Park, Kenichi Kumatani, Dimitrios Dimitriadis
Federated Learning is a fast growing area of ML where the training datasets are extremely distributed, all while dynamically changing over time.
no code implementations • 24 Jan 2021 • Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu J. Han, Shinji Watanabe, Shrikanth Narayanan
Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when".
no code implementations • 13 Apr 2020 • Tae Jin Park, Kyu J. Han, Jing Huang, Xiaodong He, Bo-Wen Zhou, Panayiotis Georgiou, Shrikanth Narayanan
This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • 5 Mar 2020 • Tae Jin Park, Kyu J. Han, Manoj Kumar, Shrikanth Narayanan
In this study, we propose a new spectral clustering framework that can auto-tune the parameters of the clustering algorithm in the context of speaker diarization.
Ranked #1 on Speaker Diarization on CALLHOME (DER(ig olp) metric)
no code implementations • 27 Nov 2018 • Tae Jin Park, Kyu Han, Ian Lane, Panayiotis Georgiou
This work presents a novel approach to leverage lexical information for speaker diarization.