TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Keyword Spotting	Google Speech Commands	TripletLoss-res15	Google Speech Commands V1 12	98.56	# 1
Keyword Spotting	Google Speech Commands	TripletLoss-res15	Google Speech Commands V2 12	98.37	# 5
Keyword Spotting	Google Speech Commands	TripletLoss-res15	Google Speech Commands V2 35	97.0	# 9

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-efficient-representations-for-3/keyword-spotting-on-google-speech-commands)](https://paperswithcode.com/sota/keyword-spotting-on-google-speech-commands?p=learning-efficient-representations-for-3)`

Learning Efficient Representations for Keyword Spotting with Triplet Loss

SPECOM 2021 · Roman Vygon, Nikolay Mikhaylovskiy ·

In the past few years, triplet loss-based metric embeddings have become a de-facto standard for several important computer vision problems, most no-tably, person reidentification. On the other hand, in the area of speech recognition the metric embeddings generated by the triplet loss are rarely used even for classification problems. We fill this gap showing that a combination of two representation learning techniques: a triplet loss-based embedding and a variant of kNN for classification instead of cross-entropy loss significantly (by 26% to 38%) improves the classification accuracy for convolutional networks on a LibriSpeech-derived LibriWords datasets. To do so, we propose a novel phonetic similarity based triplet mining approach. We also improve the current best published SOTA for Google Speech Commands dataset V1 10+2 -class classification by about 34%, achieving 98.55% accuracy, V2 10+2-class classification by about 20%, achieving 98.37% accuracy, and V2 35-class classification by over 50%, achieving 97.0% accuracy.

PDF Abstract SPECOM 2021 PDF SPECOM 2021 Abstract

Code

Add Remove Mark official

roman-vygon/triplet_loss_kws official

Tasks

Add Remove

Classification

Keyword Spotting

Representation Learning

speech-recognition

Speech Recognition

Datasets

LibriSpeech

Speech Commands

Results from the Paper

Add Remove

Ranked #1 on Keyword Spotting on Google Speech Commands

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Keyword Spotting	Google Speech Commands	TripletLoss-res15	Google Speech Commands V1 12	98.56	# 1	Compare
			Google Speech Commands V2 12	98.37	# 5	Compare
			Google Speech Commands V2 35	97.0	# 9	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Learning Efficient Representations for Keyword Spotting with Triplet Loss

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove