Speaker Diarization

74 papers with code • 12 benchmarks • 11 datasets

Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.

Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Benchmarks

Add a Result

These leaderboards are used to track progress in Speaker Diarization

Dataset	Best Model	Compare
CALLHOME	TOLD	See all
NIST-SRE 2000	x-vector (MCGAN)	See all
AMI Lapel	TitaNet-M (NME-SC)	See all
AMI MixHeadset	TitaNet-L (NME-SC)	See all
CH109	TitaNet-S (NME-SC)	See all
DIHARD	pyannote (waveform)	See all
ETAPE	pyannote (waveform)	See all
CALLHOME-109	titanet-s	See all
AMI	pyannote (waveform)	See all
Hub5'00 CallHome	UIS-RNN	See all
DIHARD II	UIS-RNN-SML	See all
AliMeeting	SOND	See all

Show all 12 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Speaker Diarization models and implementations

hitachi-speech/EEND

5 papers

347

pyannote/pyannote-audio

3 papers

5,027

alibaba-damo-academy/FunASR

3 papers

3,284

wq2012/SpectralCluster

3 papers

490

See all 5 libraries.

Datasets

Latest papers

Most implemented Social Latest No code

3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization

alibaba-damo-academy/3D-Speaker • • 29 Mar 2024

This paper introduces 3D-Speaker-Toolkit, an open source toolkit for multi-modal speaker verification and diarization.

709

29 Mar 2024

Paper
Code

Online speaker diarization of meetings guided by speech separation

egruttadauria98/sspavaldo • • 30 Jan 2024

The results show that our system improves the state-of-the-art on the AMI headset mix, using no oracle information and under full evaluation (no collar and including overlapped speech).

30 Jan 2024

Paper
Code

DiarizationLM: Speaker Diarization Post-Processing with Large Language Models

google/speaker-id • • 7 Jan 2024

In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system.

311

07 Jan 2024

Paper
Code

Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation

zqs01/multi-channel-wav2vec2 • • 7 Jan 2024

Considering that visual information helps to improve speech recognition performance in noisy scenes, in this work we propose a multichannel multi-modal speech self-supervised learning framework AV-wav2vec2, which utilizes video and multichannel audio data as inputs.

07 Jan 2024

Paper
Code

DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors

butspeechfit/diaper • • 7 Dec 2023

Until recently, the field of speaker diarization was dominated by cascaded systems.

07 Dec 2023

Paper
Code

Powerset multi-class cross entropy loss for neural speaker diarization

frenchkrab/is2023-powerset-diarization • 19 Oct 2023

Since its introduction in 2019, the whole end-to-end neural diarization (EEND) line of work has been addressing speaker diarization as a frame-wise multi-label classification problem with permutation-invariant training.

19 Oct 2023

Paper
Code

Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors

audio-westlakeu/fs-eend • • 25 Sep 2023

This work proposes a frame-wise online/streaming end-to-end neural diarization (FS-EEND) method in a frame-in-frame-out fashion.

25 Sep 2023

Paper
Code

Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture

liyunlongaaa/nsd-ms2s • • 17 Sep 2023

We propose a novel neural speaker diarization system using memory-aware multi-speaker embedding with sequence-to-sequence architecture (NSD-MS2S), which integrates the strengths of memory-aware multi-speaker embedding (MA-MSE) and sequence-to-sequence (Seq2Seq) architecture, leading to improvement in both efficiency and performance.

17 Sep 2023

Paper
Code

DiaCorrect: Error Correction Back-end For Speaker Diarization

butspeechfit/diacorrect • • 15 Sep 2023

In this work, we propose an error correction framework, named DiaCorrect, to refine the output of a diarization system in a simple yet effective way.

15 Sep 2023

Paper
Code

DiariST: Streaming Speech Translation with Speaker Diarization

mu-y/diarist • • 14 Sep 2023

End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlapping speech in a streaming fashion.

14 Sep 2023

Paper
Code

Speaker Diarization

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result