Speaker Diarization

74 papers with code • 12 benchmarks • 11 datasets

Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.

Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Benchmarks

Add a Result

These leaderboards are used to track progress in Speaker Diarization

Dataset	Best Model	Compare
CALLHOME	TOLD	See all
NIST-SRE 2000	x-vector (MCGAN)	See all
AMI Lapel	TitaNet-M (NME-SC)	See all
AMI MixHeadset	TitaNet-L (NME-SC)	See all
CH109	TitaNet-S (NME-SC)	See all
DIHARD	pyannote (waveform)	See all
ETAPE	pyannote (waveform)	See all
CALLHOME-109	titanet-s	See all
AMI	pyannote (waveform)	See all
Hub5'00 CallHome	UIS-RNN	See all
DIHARD II	UIS-RNN-SML	See all
AliMeeting	SOND	See all

Show all 12 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Speaker Diarization models and implementations

hitachi-speech/EEND

5 papers

350

pyannote/pyannote-audio

3 papers

5,090

alibaba-damo-academy/FunASR

3 papers

3,417

wq2012/SpectralCluster

3 papers

490

See all 5 libraries.

Datasets

Latest papers

Most implemented Social Latest No code

Long-term Conversation Analysis: Exploring Utility and Privacy

ol-mega/ppca • • 28 Jun 2023

The analysis of conversations recorded in everyday life requires privacy protection.

28 Jun 2023

Paper
Code

Speech Emotion Diarization: Which Emotion Appears When?

speechbrain/speechbrain • • 22 Jun 2023

Speech Emotion Recognition (SER) typically relies on utterance-level solutions.

7,911

22 Jun 2023

Paper
Code

Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks

audio-westlakeu/audiossl • • 7 Jun 2023

In order to tackle both clip-level and frame-level tasks, this paper proposes Audio Teacher-Student Transformer (ATST), with a clip-level version (named ATST-Clip) and a frame-level version (named ATST-Frame), responsible for learning clip-level and frame-level representations, respectively.

07 Jun 2023

Paper
Code

Neural Diarization with Non-autoregressive Intermediate Attractors

hitachi-speech/EEND • 13 Mar 2023

The experiments with the two-speaker CALLHOME dataset show that the intermediate labels with the proposed non-autoregressive intermediate attractors boost the diarization performance.

350

13 Mar 2023

Paper
Code

TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker Diarization

alibaba-damo-academy/FunASR • • 8 Mar 2023

Recently, end-to-end neural diarization (EEND) is introduced and achieves promising results in speaker-overlapped scenarios.

3,417

08 Mar 2023

Paper
Code

A Light Weight Model for Active Speaker Detection

junhua-liao/light-asd • • CVPR 2023

Experimental results on the AVA-ActiveSpeaker dataset show that our framework achieves competitive mAP performance (94. 1% vs. 94. 2%), while the resource costs are significantly lower than the state-of-the-art method, especially in model parameters (1. 0M vs. 22. 5M, about 23x) and FLOPs (0. 6G vs. 2. 6G, about 4x).

08 Mar 2023

Paper
Code

VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge

jaesunghuh/voxsrc2022 • 20 Feb 2023

This paper summarises the findings from the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22), which was held in conjunction with INTERSPEECH 2022.

20 Feb 2023

Paper
Code

BER: Balanced Error Rate For Speaker Diarization

x-lance/ber • 8 Nov 2022

DER is the primary metric to evaluate diarization performance while facing a dilemma: the errors in short utterances or segments tend to be overwhelmed by longer ones.

08 Nov 2022

Paper
Code

On Out-of-Distribution Detection for Audio with Deep Nearest Neighbors

zaharah/ood_audio • • 27 Oct 2022

Out-of-distribution (OOD) detection is concerned with identifying data points that do not belong to the same distribution as the model's training data.

27 Oct 2022

Paper
Code

Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering

wq2012/SpectralCluster • 25 Oct 2022

While recent research advances in speaker diarization mostly focus on improving the quality of diarization results, there is also an increasing interest in improving the efficiency of diarization systems.

490

25 Oct 2022

Paper
Code

Speaker Diarization

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result