Spoken Language Understanding
118 papers with code • 5 benchmarks • 14 datasets
Libraries
Use these libraries to find Spoken Language Understanding models and implementationsDatasets
Latest papers
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
In this paper, we propose LauraGPT, a unified GPT model for audio recognition, understanding, and generation.
BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing
One is a cascaded approach where outputs (tokens or states) of a separately trained speech recognition system are used as inputs for LLMs, which limits their potential in modeling alignment between speech and text.
Joint Multiple Intent Detection and Slot Filling with Supervised Contrastive Learning and Self-Distillation
The results also demonstrate the contributions of both bidirectional design and the training method to the accuracy improvement.
ReCoMIF: Reading comprehension based multi-source information fusion network for Chinese spoken language understanding
It usually includes slot filling and intent detection (SFID) tasks aiming at semantic parsing of utterances.
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Self-supervised learning (SSL) for speech representation has been successfully applied in various downstream tasks, such as speech and speaker recognition.
ITALIC: An Italian Intent Classification Dataset
Recent large-scale Spoken Language Understanding datasets focus predominantly on English and do not account for language-specific phenomena such as particular phonemes or words in different lects.
Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding
The ability to learn new concepts sequentially is a major weakness for modern neural networks, which hinders their use in non-stationary environments.
Zero-Shot End-to-End Spoken Language Understanding via Cross-Modal Selective Self-Training
End-to-end (E2E) spoken language understanding (SLU) is constrained by the cost of collecting speech-semantics pairs, especially when label domains change.
Sentence Embedder Guided Utterance Encoder (SEGUE) for Spoken Language Understanding
The pre-trained speech encoder wav2vec 2. 0 performs very well on various spoken language understanding (SLU) tasks.
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Conformer, a convolution-augmented Transformer variant, has become the de facto encoder architecture for speech processing due to its superior performance in various tasks, including automatic speech recognition (ASR), speech translation (ST) and spoken language understanding (SLU).