A Joint Learning Framework With BERT for Spoken Language Understanding

Intent classification and slot filling are two essential tasks for spoken language understanding. Recently, joint learning has been shown to be effective for the two tasks. However, most joint learning methods only consider joint learning using shared parameters on the surface level rather than the semantic level, and these methods suffer from small-scale human-labeled training data, resulting in poor generalization capabilities, especially for rare words. In this paper, we propose a novel encoder-decoder framework based multi-task learning model, which conducts joint training for intent classification and slot filling tasks. For the encoder of our model, we encode the input sequence as context representations using bidirectional encoder representation from transformers (BERT). For the decoder, we implement two-stage decoder process in our model. In the first stage, we use an intent classification decoder to detect the user's intent. In the second stage, we leverage the intent contextual information into the slot filling decoder to predict the semantic concept tags for each word. We conduct experiments on three popular benchmark datasets: ATIS, Snips and Facebook multilingual task-oriented datasets. The experimental results show that our proposed model outperforms the state-of-the-art approaches and achieves new state-of-the-art results on both three datasets.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here