Spoken Language Understanding
118 papers with code • 5 benchmarks • 14 datasets
Libraries
Use these libraries to find Spoken Language Understanding models and implementationsDatasets
Latest papers with no code
Creating Spoken Dialog Systems in Ultra-Low Resourced Settings
We build on existing light models for intent classification in Flemish, and our main contribution is applying different augmentation techniques on two levels -- the voice level, and the phonetic transcripts level -- to the existing models to counter the problem of scarce labeled data in low-resource languages.
Leveraging cache to enable SLU on tiny devices
Our idea is simple: let the device match new inputs against cached results, and only offload unmatched inputs to the cloud for full inference.
Co-guiding for Multi-intent Spoken Language Understanding
For the first stage, we propose single-task supervised contrastive learning, and for the second stage, we propose co-guiding supervised contrastive learning, which considers the two tasks' mutual guidances in the contrastive learning procedure.
ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding
Specifically, in fine-tuning, we apply mutual learning and train two SLU models on the manual transcripts and the ASR transcripts, respectively, aiming to iteratively share knowledge between these two models.
Generalized zero-shot audio-to-intent classification
Our multimodal training approach improves the accuracy of zero-shot intent classification on unseen intents of SLURP by 2. 75% and 18. 2% for the SLURP and internal goal-oriented dialog datasets, respectively, compared to audio-only training.
Toward Joint Language Modeling for Speech Units and Text
However, in the field of language modeling, very little effort has been made to model them jointly.
Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Recent work on speech representation models jointly pre-trained with text has demonstrated the potential of improving speech representations by encoding speech and text in a shared space.
Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis
For SLU, LaSyn improves our E2E baseline by absolute 4. 1% for intent classification accuracy and 3. 8% for slot filling SLU-F1 on SLURP, and absolute 4. 49% and 2. 25% for exact match (EM) and EM-Tree accuracies on STOP respectively.
Continual Contrastive Spoken Language Understanding
In this paper, we investigate the problem of learning sequence-to-sequence models for spoken language understanding in a class-incremental learning (CIL) setting and we propose COCONUT, a CIL method that relies on the combination of experience replay and contrastive learning.
UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions
Recent studies leverage large language models with multi-tasking capabilities, using natural language prompts to guide the model's behavior and surpassing performance of task-specific models.