Speaker Clustering in Textual Dialogue with Utterance Correlation and Cross-corpus Dialogue Act Supervision

ACL ARR January 2022 · Anonymous ·

We propose a textual dialogue speaker clustering model, which groups the utterances of a multi-party dialogue without speaker annotations, so that the real speakers are identical inside each cluster. We find that, even without knowing the speakers, the interactions between utterances are still implied in the text. Such interactions suggest the correlations of the speakers. In this work, we model the semantic content of an utterance with a pre-trained language model, and the correlations of speakers with an utterance-level pairwise matrix. The semantic content representation can be further enhanced by additional cross-corpus supervised dialogue act modeling. The speaker labels are finally generated by spectral clustering. Experiment shows that our model outperforms the sequence classification baseline, and benefits from the set-specific dialogue act classification auxiliary task. We also discuss the detail of correlation modeling and step-wise training process.

PDF Abstract