DEplain is a new dataset of parallel, professionally written and manually aligned simplifications in plain German “plain DE” (or in German: “Einfache Sprache”). DEplain consists of four main subcorpora: DEplain-APA-doc, DEplain-APA-sent, DEplain-web-doc, and DEplain-web-sent.
DEplain-APA-sent consists of approx. 500 news document pairs and approx. 13k sentence pairs. The sentence pairs are all manually aligned. The data is available upon request, please see https://doi.org/10.5281/zenodo.7674560 for more information. The corpus can be used for German text simplification, or in more detail sentence simplification.
Paper | Code | Results | Date | Stars |
---|