TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Intent Classification	Skit-S2I	Whisper(small.en)	Accuracy (%)	95.6	# 1
Speech Intent Classification	Skit-S2I	Wav2vec2(large)	Accuracy (%)	95.3	# 3
Speech Intent Classification	Skit-S2I	Hubert(large)	Accuracy (%)	95.5	# 2
Speech Intent Classification	Skit-S2I	Whisper(base.en)	Accuracy (%)	94.6	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/skit-s2i-an-indian-accented-speech-to-intent/speech-intent-classification-on-skit-s2i)](https://paperswithcode.com/sota/speech-intent-classification-on-skit-s2i?p=skit-s2i-an-indian-accented-speech-to-intent)`

Skit-S2I: An Indian Accented Speech to Intent dataset

26 Dec 2022 · Shangeth Rajaa, Swaraj Dalmia, Kumarmanas Nethil ·

Conventional conversation assistants extract text transcripts from the speech signal using automatic speech recognition (ASR) and then predict intent from the transcriptions. Using end-to-end spoken language understanding (SLU), the intents of the speaker are predicted directly from the speech signal without requiring intermediate text transcripts. As a result, the model can optimize directly for intent classification and avoid cascading errors from ASR. The end-to-end SLU system also helps in reducing the latency of the intent prediction model. Although many datasets are available publicly for text-to-intent tasks, the availability of labeled speech-to-intent datasets is limited, and there are no datasets available in the Indian accent. In this paper, we release the Skit-S2I dataset, the first publicly available Indian-accented SLU dataset in the banking domain in a conversational tonality. We experiment with multiple baselines, compare different pretrained speech encoder's representations, and find that SSL pretrained representations perform slightly better than ASR pretrained representations lacking prosodic features for speech-to-intent classification. The dataset and baseline code is available at \url{https://github.com/skit-ai/speech-to-intent-dataset}

PDF Abstract

Code

Add Remove Mark official

skit-ai/speech-to-intent-dataset official

Tasks

Add Remove

Automatic Speech Recognition

Automatic Speech Recognition (ASR)

intent-classification

Intent Classification

Speech Intent Classification

speech-recognition

Speech Recognition

Spoken Language Understanding

Datasets

Introduced in the Paper:

Skit-S2I

Used in the Paper:

SLURP Fluent Speech Commands

Results from the Paper

Edit

Ranked #1 on Speech Intent Classification on Skit-S2I

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Intent Classification	Skit-S2I	Whisper(small.en)	Accuracy (%)	95.6	# 1	Compare
Speech Intent Classification	Skit-S2I	Wav2vec2(large)	Accuracy (%)	95.3	# 3	Compare
Speech Intent Classification	Skit-S2I	Hubert(large)	Accuracy (%)	95.5	# 2	Compare
Speech Intent Classification	Skit-S2I	Whisper(base.en)	Accuracy (%)	94.6	# 4	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Skit-S2I: An Indian Accented Speech to Intent dataset

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove