Search Results for author: Nithin Rao Koluguri

Found 6 papers, 1 papers with code

Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition

no code implementations19 Sep 2023 Krishna C. Puvvada, Nithin Rao Koluguri, Kunal Dhawan, Jagadeesh Balam, Boris Ginsburg

Discrete audio representation, aka audio tokenization, has seen renewed interest driven by its potential to facilitate the application of text language modeling approaches in audio domain.

Language Modelling Quantization +4

A Compact End-to-End Model with Local and Global Context for Spoken Language Identification

no code implementations27 Oct 2022 Fei Jia, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg

We introduce TitaNet-LID, a compact end-to-end neural network for Spoken Language Identification (LID) that is based on the ContextNet architecture.

Language Identification Spoken language identification

Multi-scale Speaker Diarization with Dynamic Scale Weighting

no code implementations30 Mar 2022 Tae Jin Park, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg

First, we use multi-scale clustering as an initialization to estimate the number of speakers and obtain the average speaker representation vector for each speaker and each scale.

Decoder speaker-diarization +1

Cannot find the paper you are looking for? You can Submit a new open access paper.