TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Audio Classification	Balanced Audio Set	Base (ours)	Mean AP	37.4	# 2
Spoken Command Recognition	Speech Command v2	Base (ours)	Accuracy	98.0	# 2
Speaker Identification	VoxCeleb1	ATST Base (ours)	Top-1 (%)	94.3	# 4
Speaker Identification	VoxCeleb1	ATST Base (ours)	Accuracy	94.3	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/atst-audio-representation-learning-with/audio-classification-on-balanced-audio-set)](https://paperswithcode.com/sota/audio-classification-on-balanced-audio-set?p=atst-audio-representation-learning-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/atst-audio-representation-learning-with/spoken-command-recognition-on-speech-command)](https://paperswithcode.com/sota/spoken-command-recognition-on-speech-command?p=atst-audio-representation-learning-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/atst-audio-representation-learning-with/speaker-identification-on-voxceleb1)](https://paperswithcode.com/sota/speaker-identification-on-voxceleb1?p=atst-audio-representation-learning-with)`

ATST: Audio Representation Learning with Teacher-Student Transformer

26 Apr 2022 · Xian Li, Xiaofei Li ·

Self-supervised learning (SSL) learns knowledge from a large amount of unlabeled data, and then transfers the knowledge to a specific problem with a limited number of labeled data. SSL has achieved promising results in various domains. This work addresses the problem of segment-level general audio SSL, and proposes a new transformer-based teacher-student SSL model, named ATST. A transformer encoder is developed on a recently emerged teacher-student baseline scheme, which largely improves the modeling capability of pre-training. In addition, a new strategy for positive pair creation is designed to fully leverage the capability of transformer. Extensive experiments have been conducted, and the proposed model achieves the new state-of-the-art results on almost all of the downstream tasks.

PDF Abstract

Code

Add Remove Mark official

Audio-WestlakeU/audiossl official

Audio-WestlakeU/ATST-SED

2024-MindSpore-1/Code6

2023-MindSpore-4/Code8

Tasks

Add Remove

Audio Classification

Instrument Recognition

Representation Learning

Self-Supervised Audio Classification

Self-Supervised Learning

Speaker Identification

Spoken Command Recognition

Datasets

VoxCeleb1

AudioSet

NSynth

Results from the Paper

Edit

Ranked #2 on Spoken Command Recognition on Speech Command v2

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Audio Classification	Balanced Audio Set	Base (ours)	Mean AP	37.4	# 2	Compare
Spoken Command Recognition	Speech Command v2	Base (ours)	Accuracy	98.0	# 2	Compare
Speaker Identification	VoxCeleb1	ATST Base (ours)	Top-1 (%)	94.3	# 4	Compare
Speaker Identification	VoxCeleb1	ATST Base (ours)	Accuracy	94.3	# 4	Compare

Methods

Add Remove

BYOL

Edit Social Preview

ATST: Audio Representation Learning with Teacher-Student Transformer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove