no code implementations • 23 Jan 2024 • Prachi Singh, Sriram Ganapathy
Speaker diarization, the task of segmenting an audio recording based on speaker identity, constitutes an important speech pre-processing step for several downstream applications.
no code implementations • 21 Nov 2023 • Shikha Baghel, Shreyas Ramoji, Somil Jain, Pratik Roy Chowdhuri, Prachi Singh, Deepu Vijayasenan, Sriram Ganapathy
In multi-lingual societies, where multiple languages are spoken in a small geographic vicinity, informal conversations often involve mix of languages.
no code implementations • 1 Mar 2023 • Shikha Baghel, Shreyas Ramoji, Sidharth, Ranjana H, Prachi Singh, Somil Jain, Pratik Roy Chowdhuri, Kaustubh Kulkarni, Swapnil Padhi, Deepu Vijayasenan, Sriram Ganapathy
The challenge attempts to highlight outstanding issues in speaker diarization (SD) in multilingual settings with code-mixing.
no code implementations • 28 Feb 2023 • Prachi Singh, Srikrishna Karanam, Sumit Shekhar
We consider and propose a new problem of retrieving audio files relevant to multimodal design document inputs comprising both textual elements and visual imagery, e. g., birthday/greeting cards.
no code implementations • 24 Feb 2023 • Prachi Singh, Amrit Kaul, Sriram Ganapathy
We also propose an approach to jointly update the embedding extractor and the GNN model to perform end-to-end speaker diarization (E2E-SHARC).
1 code implementation • 14 Sep 2021 • Prachi Singh, Sriram Ganapathy
In this paper, we propose an approach that jointly learns the speaker embeddings and the similarity metric using principles of self-supervised learning.
1 code implementation • 19 Apr 2021 • Prachi Singh, Sriram Ganapathy
In this paper, we propose a representation learning and clustering algorithm that can be iteratively performed for improved speaker diarization.
no code implementations • 6 Apr 2021 • Prachi Singh, Rajat Varma, Venkat Krishnamohan, Srikanth Raj Chetupalli, Sriram Ganapathy
This paper describes the challenge submission, the post-evaluation analysis and improvements observed on the DIHARD-III dataset.
3 code implementations • 2 Dec 2020 • Neville Ryant, Prachi Singh, Venkat Krishnamohan, Rajat Varma, Kenneth Church, Christopher Cieri, Jun Du, Sriram Ganapathy, Mark Liberman
DIHARD III was the third in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variability in recording equipment, noise conditions, and conversational domain.
1 code implementation • 10 Aug 2020 • Prachi Singh, Sriram Ganapathy
In this paper, we propose a novel algorithm for hierarchical clustering which combines the speaker clustering along with a representation learning framework.
Audio and Speech Processing
no code implementations • 7 Feb 2020 • Shreyas Ramoji, Prashant Krishnan, Bhargavram Mysore, Prachi Singh, Sriram Ganapathy
In this paper, we provide a detailed account of the LEAP SRE system submitted to the CTS challenge focusing on the novel components in the back-end system modeling.
1 code implementation • 20 Jan 2020 • Shreyas Ramoji, Prashant Krishnan V, Prachi Singh, Sriram Ganapathy
The pre-processing steps of linear discriminant analysis (LDA), unit length normalization and within class covariance normalization are all modeled as layers of a neural model and the speaker verification cost functions can be back-propagated through these layers during training.