Speaker Diarization

74 papers with code • 12 benchmarks • 11 datasets

Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.

Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Libraries

Use these libraries to find Speaker Diarization models and implementations

Long-term Conversation Analysis: Exploring Utility and Privacy

ol-mega/ppca 28 Jun 2023

The analysis of conversations recorded in everyday life requires privacy protection.

2
28 Jun 2023

Speech Emotion Diarization: Which Emotion Appears When?

speechbrain/speechbrain 22 Jun 2023

Speech Emotion Recognition (SER) typically relies on utterance-level solutions.

7,911
22 Jun 2023

Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks

audio-westlakeu/audiossl 7 Jun 2023

In order to tackle both clip-level and frame-level tasks, this paper proposes Audio Teacher-Student Transformer (ATST), with a clip-level version (named ATST-Clip) and a frame-level version (named ATST-Frame), responsible for learning clip-level and frame-level representations, respectively.

65
07 Jun 2023

Neural Diarization with Non-autoregressive Intermediate Attractors

hitachi-speech/EEND 13 Mar 2023

The experiments with the two-speaker CALLHOME dataset show that the intermediate labels with the proposed non-autoregressive intermediate attractors boost the diarization performance.

350
13 Mar 2023

TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker Diarization

alibaba-damo-academy/FunASR 8 Mar 2023

Recently, end-to-end neural diarization (EEND) is introduced and achieves promising results in speaker-overlapped scenarios.

3,417
08 Mar 2023

A Light Weight Model for Active Speaker Detection

junhua-liao/light-asd CVPR 2023

Experimental results on the AVA-ActiveSpeaker dataset show that our framework achieves competitive mAP performance (94. 1% vs. 94. 2%), while the resource costs are significantly lower than the state-of-the-art method, especially in model parameters (1. 0M vs. 22. 5M, about 23x) and FLOPs (0. 6G vs. 2. 6G, about 4x).

86
08 Mar 2023

VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge

jaesunghuh/voxsrc2022 20 Feb 2023

This paper summarises the findings from the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22), which was held in conjunction with INTERSPEECH 2022.

17
20 Feb 2023

BER: Balanced Error Rate For Speaker Diarization

x-lance/ber 8 Nov 2022

DER is the primary metric to evaluate diarization performance while facing a dilemma: the errors in short utterances or segments tend to be overwhelmed by longer ones.

24
08 Nov 2022

On Out-of-Distribution Detection for Audio with Deep Nearest Neighbors

zaharah/ood_audio 27 Oct 2022

Out-of-distribution (OOD) detection is concerned with identifying data points that do not belong to the same distribution as the model's training data.

4
27 Oct 2022

Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering

wq2012/SpectralCluster 25 Oct 2022

While recent research advances in speaker diarization mostly focus on improving the quality of diarization results, there is also an increasing interest in improving the efficiency of diarization systems.

490
25 Oct 2022