SG-VAD: Stochastic Gates Based Speech Activity Detection

28 Oct 2022  ·  Jonathan Svirsky, Ofir Lindenbaum ·

We propose a novel voice activity detection (VAD) model in a low-resource environment. Our key idea is to model VAD as a denoising task, and construct a network that is designed to identify nuisance features for a speech classification task. We train the model to simultaneously identify irrelevant features while predicting the type of speech event. Our model contains only 7.8K parameters, outperforms the previously proposed methods on the AVA-Speech evaluation set, and provides comparative results on the HAVIC dataset. We present its architecture, experimental results, and ablation study on the model's components. We publish the code and the models here https://www.github.com/jsvir/vad.

PDF Abstract

Datasets


Results from the Paper


Ranked #3 on Activity Detection on AVA-Speech (ROC-AUC metric)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Activity Detection AVA-Speech SG-VAD (ours) ROC-AUC 94.3 # 3

Methods


No methods listed for this paper. Add relevant methods here