Sequence Labeling Approach to the Task of Sentence Boundary Detection

ICMLSC 2020: Proceedings of the 4th International Conference on Machine Learning and Soft Computing 2020 · The Anh Le ·

One of the keys to enable chatbots to communicate with human in a more natural way is the ability to handle long and complex user's utterances. In order to achieve this goal, we propose to integrate the Sentence Boundary Detection (SBD) module into the chatbot architecture, whose role is to take as input a user's utterance from an automatic speech recognition device, in which sentence boundaries are not available, and output the corresponding list of punctuated sentences for downstream modules such as Intent Detection, Topic Classification, Sentiment Analysis, Named Entity Recognition, as well as Coreference Recognition. To address the SBD task, we reformulate it as a sequence labeling task. In this way, both deep neural network models (e.g., Bi-directional Long Short-Term Memory, Convolutional Neural Network) and structured prediction models (e.g., Hidden Markov Model, Maximum Entropy Model, Conditional Random Field) can be leveraged. After reformulating the SBD task, we built a hybrid deep neural network model and achieved good performance on both CornellMovie-Dialog and DailyDialog datasets.

PDF Abstract