Search Results for author: Thomas Proisl

Found 16 papers, 3 papers with code

EmpiriST Corpus 2.0: Adding Manual Normalization, Lemmatization and Semantic Tagging to a German Web and CMC Corpus

no code implementations LREC 2020 Thomas Proisl, Natalie Dykes, Philipp Heinrich, Besim Kabashi, Andreas Blombach, Stefan Evert

The EmpiriST corpus (Bei{\ss}wenger et al., 2016) is a manually tokenized and part-of-speech tagged corpus of approximately 23, 000 tokens of German Web and CMC (computer-mediated communication) data.

Lemmatization

A Corpus of German Reddit Exchanges (GeRedE)

no code implementations LREC 2020 Andreas Blombach, Natalie Dykes, Philipp Heinrich, Besim Kabashi, Thomas Proisl

GeRedE is a 270 million token German CMC corpus containing approximately 380, 000 submissions and 6, 800, 000 comments posted on Reddit between 2010 and 2018.

Efficient Dependency Graph Matching with the IMS Open Corpus Workbench

no code implementations LREC 2012 Thomas Proisl, Peter Uhrig

State-of-the-art dependency representations such as the Stanford Typed Dependencies may represent the grammatical relations in a sentence as directed, possibly cyclic graphs.

Dependency Parsing Graph Matching +2

Cannot find the paper you are looking for? You can Submit a new open access paper.