Browse SoTA > Speech > Speaker Diarization

Speaker Diarization

13 papers with code · Speech

Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.

Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Benchmarks

No evaluation results yet. Help compare methods by submit evaluation metrics.

Greatest papers with code

Fully Supervised Speaker Diarization

10 Oct 2018google/uis-rnn

In this paper, we propose a fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN).

SPEAKER DIARIZATION

Speaker Diarization with LSTM

28 Oct 2017wq2012/SpectralCluster

For many years, i-vector based audio embedding techniques were the dominant approach for speaker verification and speaker diarization applications.

SPEAKER DIARIZATION SPEAKER VERIFICATION

AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

5 Jan 2019cvdfoundation/ava-dataset

The dataset contains temporally labeled face tracks in video, where each face instance is labeled as speaking or not, and whether the speech is audible.

SPEAKER DIARIZATION SPEECH ENHANCEMENT

End-to-End Neural Speaker Diarization with Self-attention

13 Sep 2019hitachi-speech/EEND

Our method was even better than that of the state-of-the-art x-vector clustering-based method.

SPEAKER DIARIZATION

End-to-End Neural Speaker Diarization with Permutation-Free Objectives

12 Sep 2019hitachi-speech/EEND

To realize such a model, we formulate the speaker diarization problem as a multi-label classification problem, and introduces a permutation-free objective function to directly minimize diarization errors without being suffered from the speaker-label permutation problem.

DOMAIN ADAPTATION MULTI-LABEL CLASSIFICATION SPEAKER DIARIZATION

Supervised online diarization with sample mean loss for multi-domain data

4 Nov 2019DonkeyShot21/uis-rnn-sml

Recently, a fully supervised speaker diarization approach was proposed (UIS-RNN) which models speakers using multiple instances of a parameter-sharing recurrent neural network.

SPEAKER DIARIZATION

Phoneme Boundary Detection using Learnable Segmental Features

11 Feb 2020felixkreuk/SegFeat

Phoneme boundary detection plays an essential first step for a variety of speech processing applications such as speaker diarization, speech science, keyword spotting, etc.

BOUNDARY DETECTION KEYWORD SPOTTING SPEAKER DIARIZATION

Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap

5 Mar 2020tango4j/Auto-Tuning-Spectral-Clustering

In this study, we propose a new spectral clustering framework that can auto-tune the parameters of the clustering algorithm in the context of speaker diarization.

SPEAKER DIARIZATION

LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization

23 Jul 2019cvqluu/nn-similarity-diarization

More and more neural network approaches have achieved considerable improvement upon submodules of speaker diarization system, including speaker change detection and segment-wise speaker embedding extraction.

SPEAKER DIARIZATION

End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification

24 Feb 2020Xflick/EEND_PyTorch

However, the clustering-based approach has a number of problems; i. e., (i) it is not optimized to minimize diarization errors directly, (ii) it cannot handle speaker overlaps correctly, and (iii) it has trouble adapting their speaker embedding models to real audio recordings with speaker overlaps.

MULTI-LABEL CLASSIFICATION SPEAKER DIARIZATION