DEplain-APA-sent: A German Parallel Corpus for Sentence Simplification on News Texts

DEplain is a new dataset of parallel, professionally written and manually aligned simplifications in plain German “plain DE” (or in German: “Einfache Sprache”). DEplain consists of four main subcorpora: DEplain-APA-doc, DEplain-APA-sent, DEplain-web-doc, and DEplain-web-sent.

DEplain-APA-sent consists of approx. 500 news document pairs and approx. 13k sentence pairs. The sentence pairs are all manually aligned. The data is available upon request, please see https://doi.org/10.5281/zenodo.7674560 for more information. The corpus can be used for German text simplification, or in more detail sentence simplification.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


Modalities


Languages