Towards the Data-driven System for Rhetorical Parsing of Russian Texts

Results of the first experimental evaluation of machine learning models trained on Ru-RSTreebank {--} first Russian corpus annotated within RST framework {--} are presented. Various lexical, quantitative, morphological, and semantic features were used. In rhetorical relation classification, ensemble of CatBoost model with selected features and a linear SVM model provides the best score (macro F1 = 54.67 {\mbox{$\pm$}} 0.38). We discover that most of the important features for rhetorical relation classification are related to discourse connectives derived from the connectives lexicon for Russian and from other sources.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods