Search Results for author: Nithin Rao Koluguri

Found 6 papers, 1 papers with code

Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition

no code implementations • 19 Sep 2023 • Krishna C. Puvvada, Nithin Rao Koluguri, Kunal Dhawan, Jagadeesh Balam, Boris Ginsburg

Discrete audio representation, aka audio tokenization, has seen renewed interest driven by its potential to facilitate the application of text language modeling approaches in audio domain.

Language Modelling Quantization +4

Paper
Add Code

Investigating End-to-End ASR Architectures for Long Form Audio Transcription

no code implementations • 18 Sep 2023 • Nithin Rao Koluguri, Samuel Kriman, Georgy Zelenfroind, Somshubra Majumdar, Dima Rekesh, Vahid Noroozi, Jagadeesh Balam, Boris Ginsburg

This paper presents an overview and evaluation of some of the end-to-end ASR models on long-form audios.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

no code implementations • 8 May 2023 • Dima Rekesh, Nithin Rao Koluguri, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleksii Hrinchuk, Krishna Puvvada, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg

Conformer-based models have become the dominant end-to-end architecture for speech processing tasks.

Ranked #1 on Speech Recognition on LibriSpeech test-other

Automatic Speech Recognition Decoder +4

Paper
Add Code

A Compact End-to-End Model with Local and Global Context for Spoken Language Identification

no code implementations • 27 Oct 2022 • Fei Jia, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg

We introduce TitaNet-LID, a compact end-to-end neural network for Spoken Language Identification (LID) that is based on the ContextNet architecture.

Language Identification Spoken language identification

Paper
Add Code

Multi-scale Speaker Diarization with Dynamic Scale Weighting

no code implementations • 30 Mar 2022 • Tae Jin Park, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg

First, we use multi-scale clustering as an initialization to estimate the number of speakers and obtain the average speaker representation vector for each speaker and each scale.

Decoder speaker-diarization +1

Paper
Add Code

TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context

2 code implementations • 8 Oct 2021 • Nithin Rao Koluguri, Taejin Park, Boris Ginsburg

In this paper, we propose TitaNet, a novel neural network architecture for extracting speaker representations.

Ranked #1 on Speaker Diarization on CALLHOME-109

speaker-diarization Speaker Diarization +1

10,148

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.