Context-Aware Language Modeling for Goal-Oriented Dialogue Systems

ACL ARR November 2021 · Anonymous ·

Goal-oriented dialogue systems has long faced the trade-off between fluent language generation and task-specific control. While supervised learning with large language models are capable of producing realistic responses, how to steer such responses towards completing a specific task without sacrificing language quality remains an open question. In this work, by viewing a goal-oriented dialogue system as a reinforcement learning (RL) problem, we turn a supervised language model into a dynamics model and a behavioral cloning policy in a partially observable Markov decision process. This view allows RL techniques such as task relabeling and goal-conditioned policy to be naturally adopted as a form of data augmentation and task-specific fintuning of language models. We evaluate our method, Context-Aware Language Models (\method), on a practical flight-booking task using AirDialogue. Empirically, \method outperforms the previous state-of-the-art method by more than 10\% in terms of task success, achieving human-level task performance on this dataset.

PDF Abstract