Speaker Diarization

74 papers with code • 12 benchmarks • 11 datasets

Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.

Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Libraries

Use these libraries to find Speaker Diarization models and implementations

3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization

alibaba-damo-academy/3D-Speaker 29 Mar 2024

This paper introduces 3D-Speaker-Toolkit, an open source toolkit for multi-modal speaker verification and diarization.

709
29 Mar 2024

Online speaker diarization of meetings guided by speech separation

egruttadauria98/sspavaldo 30 Jan 2024

The results show that our system improves the state-of-the-art on the AMI headset mix, using no oracle information and under full evaluation (no collar and including overlapped speech).

16
30 Jan 2024

DiarizationLM: Speaker Diarization Post-Processing with Large Language Models

google/speaker-id 7 Jan 2024

In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system.

311
07 Jan 2024

Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation

zqs01/multi-channel-wav2vec2 7 Jan 2024

Considering that visual information helps to improve speech recognition performance in noisy scenes, in this work we propose a multichannel multi-modal speech self-supervised learning framework AV-wav2vec2, which utilizes video and multichannel audio data as inputs.

1
07 Jan 2024

DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors

butspeechfit/diaper 7 Dec 2023

Until recently, the field of speaker diarization was dominated by cascaded systems.

21
07 Dec 2023

Powerset multi-class cross entropy loss for neural speaker diarization

frenchkrab/is2023-powerset-diarization 19 Oct 2023

Since its introduction in 2019, the whole end-to-end neural diarization (EEND) line of work has been addressing speaker diarization as a frame-wise multi-label classification problem with permutation-invariant training.

45
19 Oct 2023

Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors

audio-westlakeu/fs-eend 25 Sep 2023

This work proposes a frame-wise online/streaming end-to-end neural diarization (FS-EEND) method in a frame-in-frame-out fashion.

57
25 Sep 2023

Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture

liyunlongaaa/nsd-ms2s 17 Sep 2023

We propose a novel neural speaker diarization system using memory-aware multi-speaker embedding with sequence-to-sequence architecture (NSD-MS2S), which integrates the strengths of memory-aware multi-speaker embedding (MA-MSE) and sequence-to-sequence (Seq2Seq) architecture, leading to improvement in both efficiency and performance.

48
17 Sep 2023

DiaCorrect: Error Correction Back-end For Speaker Diarization

butspeechfit/diacorrect 15 Sep 2023

In this work, we propose an error correction framework, named DiaCorrect, to refine the output of a diarization system in a simple yet effective way.

6
15 Sep 2023

DiariST: Streaming Speech Translation with Speaker Diarization

mu-y/diarist 14 Sep 2023

End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlapping speech in a streaming fashion.

14
14 Sep 2023