Keyword Spotting
96 papers with code • 10 benchmarks • 8 datasets
In speech processing, keyword spotting deals with the identification of keywords in utterances.
( Image credit: Simon Grest )
Libraries
Use these libraries to find Keyword Spotting models and implementationsDatasets
Most implemented papers
MLPerf Tiny Benchmark
Advancements in ultra-low-power tiny machine learning (TinyML) systems promise to unlock an entirely new class of smart applications.
SSAST: Self-Supervised Audio Spectrogram Transformer
However, pure Transformer models tend to require more training data compared to CNNs, and the success of the AST relies on supervised pretraining that requires a large amount of labeled data and a complex training pipeline, thus limiting the practical usage of AST.
Progressive Continual Learning for Spoken Keyword Spotting
Catastrophic forgetting is a thorny challenge when updating keyword spotting (KWS) models after deployment.
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit
PaddleSpeech is an open-source all-in-one speech toolkit.
Improving Label-Deficient Keyword Spotting Through Self-Supervised Pretraining
This paper explores the effectiveness of SSL on small models for KWS and establishes that SSL can enhance the performance of small KWS models when labelled data is scarce.
What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision
We present a novel method for aligning a sequence of instructions to a video of someone carrying out a task.
GTM-UVigo Systems for the Query-by-Example Search on Speech Task at MediaEval 2015
In this paper, we present the systems developed by GTMUVigo team for the query by example search on speech task (QUESST) at MediaEval 2015.
Zero-shot keyword spotting for visual speech recognition in-the-wild
Visual keyword spotting (KWS) is the problem of estimating whether a text query occurs in a given recording using only video information.
Federated Learning for Keyword Spotting
We propose a practical approach based on federated learning to solve out-of-domain issues with continuously running embedded speech-based models such as wake word detectors.