TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speaker Identification	VoxCeleb1	AutoSpeech (N=8,C=128)	Top-1 (%)	87.66	# 6
Speaker Identification	VoxCeleb1	AutoSpeech (N=8,C=128)	Top-5 (%)	96.01	# 1
Speaker Identification	VoxCeleb1	AutoSpeech (N=8,C=128)	Number of Params	18M	# 1
Speaker Identification	VoxCeleb1	AutoSpeech (N=8,C=128)	Accuracy	87.66	# 6

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/autospeech-neural-architecture-search-for/speaker-identification-on-voxceleb1)](https://paperswithcode.com/sota/speaker-identification-on-voxceleb1?p=autospeech-neural-architecture-search-for)`

AutoSpeech: Neural Architecture Search for Speaker Recognition

7 May 2020 · Shaojin Ding, Tianlong Chen, Xinyu Gong, Weiwei Zha, Zhangyang Wang ·

Speaker recognition systems based on Convolutional Neural Networks (CNNs) are often built with off-the-shelf backbones such as VGG-Net or ResNet. However, these backbones were originally proposed for image classification, and therefore may not be naturally fit for speaker recognition. Due to the prohibitive complexity of manually exploring the design space, we propose the first neural architecture search approach approach for the speaker recognition tasks, named as AutoSpeech. Our algorithm first identifies the optimal operation combination in a neural cell and then derives a CNN model by stacking the neural cell for multiple times. The final speaker recognition model can be obtained by training the derived CNN model through the standard scheme. To evaluate the proposed approach, we conduct experiments on both speaker identification and speaker verification tasks using the VoxCeleb1 dataset. Results demonstrate that the derived CNN architectures from the proposed approach significantly outperform current speaker recognition systems based on VGG-M, ResNet-18, and ResNet-34 back-bones, while enjoying lower model complexity.

PDF Abstract

Code

Add Remove Mark official

TAMU-VITA/AutoSpeech official

205

VITA-Group/AutoSpeech

205

JeongwookUm/TEST_AutoSpeech-master

Tasks

Add Remove

Image Classification

Neural Architecture Search

Speaker Identification

Speaker Recognition

Speaker Verification

Datasets

VoxCeleb1

Results from the Paper

Edit

Ranked #6 on Speaker Identification on VoxCeleb1 (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speaker Identification	VoxCeleb1	AutoSpeech (N=8,C=128)	Top-1 (%)	87.66	# 6	Compare
			Top-5 (%)	96.01	# 1	Compare
			Number of Params	18M	# 1	Compare
			Accuracy	87.66	# 6	Compare

Methods

Add Remove

1x1 Convolution • Average Pooling • Batch Normalization • Bottleneck Residual Block • Convolution • Global Average Pooling • Kaiming Initialization • Max Pooling • ReLU • Residual Block • Residual Connection • ResNet

Edit Social Preview

AutoSpeech: Neural Architecture Search for Speaker Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove