TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Keyword Spotting	VoxForge	2D-ConvNet	Accuracy (%)	95.4	# 2
Keyword Spotting	VoxForge	1D-ConvNet	Accuracy (%)	93.7	# 1
Spoken language identification	VoxForge Commonwealth	1D ConvNet(MixUp=NO)	Accuracy (%)	93.7	# 4
Spoken language identification	VoxForge Commonwealth	2D ConvNet with Attention and GRU(MixUp=YES)	Accuracy (%)	95.0	# 2
Spoken language identification	VoxForge Commonwealth	2D ConvNet(MixUp=YES)	Accuracy (%)	95.4	# 1
Spoken language identification	VoxForge Commonwealth	2D ConvNet(MixUp=NO)	Accuracy (%)	94.3	# 3
Spoken language identification	VoxForge European	2D ConvNet(MixUp=NO)	Accuracy (%)	96.0	# 2
Spoken language identification	VoxForge European	1D ConvNet(MixUp=NO)	Accuracy (%)	94.4	# 4
Spoken language identification	VoxForge European	2D ConvNet with Attention and GRU(MixUp=NO)	Accuracy (%)	94.7	# 3
Spoken language identification	VoxForge European	2D ConvNet with Attention and GRU(MixUp=YES)	Accuracy (%)	93.7	# 5
Spoken language identification	VoxForge European	2D ConvNet(MixUp=YES)	Accuracy (%)	96.3	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/spoken-language-identification-using-convnets/keyword-spotting-on-voxforge)](https://paperswithcode.com/sota/keyword-spotting-on-voxforge?p=spoken-language-identification-using-convnets)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/spoken-language-identification-using-convnets/spoken-language-identification-on-voxforge)](https://paperswithcode.com/sota/spoken-language-identification-on-voxforge?p=spoken-language-identification-using-convnets)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/spoken-language-identification-using-convnets/spoken-language-identification-on-voxforge-1)](https://paperswithcode.com/sota/spoken-language-identification-on-voxforge-1?p=spoken-language-identification-using-convnets)`

Spoken Language Identification using ConvNets

9 Oct 2019 · Sarthak, Shikhar Shukla, Govind Mittal ·

Language Identification (LI) is an important first step in several speech processing systems. With a growing number of voice-based assistants, speech LI has emerged as a widely researched field. To approach the problem of identifying languages, we can either adopt an implicit approach where only the speech for a language is present or an explicit one where text is available with its corresponding transcript. This paper focuses on an implicit approach due to the absence of transcriptive data. This paper benchmarks existing models and proposes a new attention based model for language identification which uses log-Mel spectrogram images as input. We also present the effectiveness of raw waveforms as features to neural network models for LI tasks. For training and evaluation of models, we classified six languages (English, French, German, Spanish, Russian and Italian) with an accuracy of 95.4% and four languages (English, French, German, Spanish) with an accuracy of 96.3% obtained from the VoxForge dataset. This approach can further be scaled to incorporate more languages.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Keyword Spotting

Language Identification

Spoken language identification

Datasets

VoxForge

Results from the Paper

Edit

Ranked #1 on Spoken language identification on VoxForge Commonwealth

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Keyword Spotting	VoxForge	2D-ConvNet	Accuracy (%)	95.4	# 2	Compare
Keyword Spotting	VoxForge	1D-ConvNet	Accuracy (%)	93.7	# 1	Compare
Spoken language identification	VoxForge Commonwealth	1D ConvNet(MixUp=NO)	Accuracy (%)	93.7	# 4	Compare
Spoken language identification	VoxForge Commonwealth	2D ConvNet with Attention and GRU(MixUp=YES)	Accuracy (%)	95.0	# 2	Compare
Spoken language identification	VoxForge Commonwealth	2D ConvNet(MixUp=YES)	Accuracy (%)	95.4	# 1	Compare
Spoken language identification	VoxForge Commonwealth	2D ConvNet(MixUp=NO)	Accuracy (%)	94.3	# 3	Compare
Spoken language identification	VoxForge European	2D ConvNet(MixUp=NO)	Accuracy (%)	96.0	# 2	Compare
Spoken language identification	VoxForge European	1D ConvNet(MixUp=NO)	Accuracy (%)	94.4	# 4	Compare
Spoken language identification	VoxForge European	2D ConvNet with Attention and GRU(MixUp=NO)	Accuracy (%)	94.7	# 3	Compare
Spoken language identification	VoxForge European	2D ConvNet with Attention and GRU(MixUp=YES)	Accuracy (%)	93.7	# 5	Compare
Spoken language identification	VoxForge European	2D ConvNet(MixUp=YES)	Accuracy (%)	96.3	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Spoken Language Identification using ConvNets

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove