Sequence Labeling Approach to the Task of Sentence Boundary Detection

One of the keys to enable chatbots to communicate with human in a more natural way is the ability to handle long and complex user's utterances. In order to achieve this goal, we propose to integrate the Sentence Boundary Detection (SBD) module into the chatbot architecture, whose role is to take as input a user's utterance from an automatic speech recognition device, in which sentence boundaries are not available, and output the corresponding list of punctuated sentences for downstream modules such as Intent Detection, Topic Classification, Sentiment Analysis, Named Entity Recognition, as well as Coreference Recognition. To address the SBD task, we reformulate it as a sequence labeling task. In this way, both deep neural network models (e.g., Bi-directional Long Short-Term Memory, Convolutional Neural Network) and structured prediction models (e.g., Hidden Markov Model, Maximum Entropy Model, Conditional Random Field) can be leveraged. After reformulating the SBD task, we built a hybrid deep neural network model and achieved good performance on both CornellMovie-Dialog and DailyDialog datasets.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods