Speaker Diarization
74 papers with code • 12 benchmarks • 11 datasets
Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.
Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm
Libraries
Use these libraries to find Speaker Diarization models and implementationsDatasets
Latest papers
3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization
This paper introduces 3D-Speaker-Toolkit, an open source toolkit for multi-modal speaker verification and diarization.
Online speaker diarization of meetings guided by speech separation
The results show that our system improves the state-of-the-art on the AMI headset mix, using no oracle information and under full evaluation (no collar and including overlapped speech).
DiarizationLM: Speaker Diarization Post-Processing with Large Language Models
In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system.
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation
Considering that visual information helps to improve speech recognition performance in noisy scenes, in this work we propose a multichannel multi-modal speech self-supervised learning framework AV-wav2vec2, which utilizes video and multichannel audio data as inputs.
DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors
Until recently, the field of speaker diarization was dominated by cascaded systems.
Powerset multi-class cross entropy loss for neural speaker diarization
Since its introduction in 2019, the whole end-to-end neural diarization (EEND) line of work has been addressing speaker diarization as a frame-wise multi-label classification problem with permutation-invariant training.
Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors
This work proposes a frame-wise online/streaming end-to-end neural diarization (FS-EEND) method in a frame-in-frame-out fashion.
Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture
We propose a novel neural speaker diarization system using memory-aware multi-speaker embedding with sequence-to-sequence architecture (NSD-MS2S), which integrates the strengths of memory-aware multi-speaker embedding (MA-MSE) and sequence-to-sequence (Seq2Seq) architecture, leading to improvement in both efficiency and performance.
DiaCorrect: Error Correction Back-end For Speaker Diarization
In this work, we propose an error correction framework, named DiaCorrect, to refine the output of a diarization system in a simple yet effective way.
DiariST: Streaming Speech Translation with Speaker Diarization
End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlapping speech in a streaming fashion.