Search Results for author: Jonas Kuhn

Found 81 papers, 11 papers with code

Applying Occam’s Razor to Transformer-Based Dependency Parsing: What Works, What Doesn’t, and What is Really Necessary

no code implementations ACL (IWPT) 2021 Stefan Grünewald, Annemarie Friedrich, Jonas Kuhn

We find that the choice of pre-trained embeddings has by far the greatest impact on parser performance and identify XLM-R as a robust choice across the languages in our study.

Dependency Parsing Word Embeddings +1

Investigating Active Learning Sampling Strategies for Extreme Multi Label Text Classification

no code implementations LREC 2022 Lukas Wertz, Katsiaryna Mirylenka, Jonas Kuhn, Jasmina Bogojeska

Large scale, multi-label text datasets with high numbers of different classes are expensive to annotate, even more so if they deal with domain specific language.

Active Learning Extreme Multi-Label Classification +3

Improving Neural Political Statement Classification with Class Hierarchical Information

no code implementations Findings (ACL) 2022 Erenay Dayanik, Andre Blessing, Nico Blokker, Sebastian Haunss, Jonas Kuhn, Gabriella Lapesa, Sebastian Pado

Many tasks in text-based computational social science (CSS) involve the classification of political statements into categories based on a domain-specific codebook.

Classification

Using Hierarchical Class Structure to Improve Fine-Grained Claim Classification

no code implementations ACL (spnlp) 2021 Erenay Dayanik, Andre Blessing, Nico Blokker, Sebastian Haunss, Jonas Kuhn, Gabriella Lapesa, Sebastian Padó

The analysis of public debates crucially requires the classification of political demands according to hierarchical claim ontologies (e. g. for immigration, a supercategory “Controlling Migration” might have subcategories “Asylum limit” or “Border installations”).

Classification

Identifying and Handling Cross-Treebank Inconsistencies in UD: A Pilot Study

1 code implementation UDW (COLING) 2020 Tillmann Dönicke, Xiang Yu, Jonas Kuhn

The Universal Dependencies treebanks are a still-growing collection of treebanks for a wide range of languages, all annotated with a common inventory of dependency relations.

Between welcome culture and border fence. A dataset on the European refugee crisis in German newspaper reports

no code implementations19 Nov 2021 Nico Blokker, André Blessing, Erenay Dayanik, Jonas Kuhn, Sebastian Padó, Gabriella Lapesa

Besides the released resources and the case-study, our contribution is also methodological: we talk the reader through the steps from a newspaper article to a discourse network, demonstrating that there is not just one discourse network for the German migration debate, but multiple ones, depending on the topic of interest (political actors, policy fields, time spans).

Cultural Vocal Bursts Intensity Prediction

Negation-Instance Based Evaluation of End-to-End Negation Resolution

1 code implementation CoNLL (EMNLP) 2021 Elizaveta Sineva, Stefan Grünewald, Annemarie Friedrich, Jonas Kuhn

In this paper, we revisit the task of negation resolution, which includes the subtasks of cue detection (e. g. "not", "never") and scope resolution.

Negation

Modeling Sense Structure in Word Usage Graphs with the Weighted Stochastic Block Model

1 code implementation Joint Conference on Lexical and Computational Semantics 2021 Dominik Schlechtweg, Enrique Castaneda, Jonas Kuhn, Sabine Schulte im Walde

We suggest to model human-annotated Word Usage Graphs capturing fine-grained semantic proximity distinctions between word uses with a Bayesian formulation of the Weighted Stochastic Block Model, a generative model for random graphs popular in biology, physics and social sciences.

Stochastic Block Model

Lexical Semantic Change Discovery

1 code implementation ACL 2021 Sinan Kurtyigit, Maike Park, Dominik Schlechtweg, Jonas Kuhn, Sabine Schulte im Walde

While there is a large amount of research in the field of Lexical Semantic Change Detection, only few approaches go beyond a standard benchmark evaluation of existing models.

Change Detection

Real-Valued Logics for Typological Universals: Framework and Application

no code implementations COLING 2020 Tillmann D{\"o}nicke, Xiang Yu, Jonas Kuhn

This paper proposes a framework for the expression of typological statements which uses real-valued logics to capture the empirical truth value (truth degree) of a formula on a given data source, e. g. a collection of multilingual treebanks with comparable annotation.

Applying Occam's Razor to Transformer-Based Dependency Parsing: What Works, What Doesn't, and What is Really Necessary

2 code implementations23 Oct 2020 Stefan Grünewald, Annemarie Friedrich, Jonas Kuhn

We find that the choice of pre-trained embeddings has by far the greatest impact on parser performance and identify XLM-R as a robust choice across the languages in our study.

Dependency Parsing Part-Of-Speech Tagging +2

Ensemble Self-Training for Low-Resource Languages: Grapheme-to-Phoneme Conversion and Morphological Inflection

no code implementations WS 2020 Xiang Yu, Ngoc Thang Vu, Jonas Kuhn

We present an iterative data augmentation framework, which trains and searches for an optimal ensemble and simultaneously annotates new training data in a self-training style.

Data Augmentation Morphological Inflection

CCOHA: Clean Corpus of Historical American English

no code implementations LREC 2020 Reem Alatrash, Dominik Schlechtweg, Jonas Kuhn, Sabine Schulte im Walde

Modelling language change is an increasingly important area of interest within the fields of sociolinguistics and historical linguistics.

GRAIN-S: Manually Annotated Syntax for German Interviews

no code implementations LREC 2020 Agnieszka Falenska, Zolt{\'a}n Czesznak, Kerstin Jung, Moritz V{\"o}lkel, Wolfgang Seeker, Jonas Kuhn

The dataset extends an existing corpus GRAIN and comes with constituency and dependency trees for six interviews.

Head-First Linearization with Tree-Structured Representation

no code implementations WS 2019 Xiang Yu, Agnieszka Falenska, Ngoc Thang Vu, Jonas Kuhn

We present a dependency tree linearization model with two novel components: (1) a tree-structured encoder based on bidirectional Tree-LSTM that propagates information first bottom-up then top-down, which allows each token to access information from the entire tree; and (2) a linguistically motivated head-first decoder that emphasizes the central role of the head and linearizes the subtree by incrementally attaching the dependents on both sides of the head.

Learning the Dyck Language with Attention-based Seq2Seq Models

no code implementations WS 2019 Xiang Yu, Ngoc Thang Vu, Jonas Kuhn

The generalized Dyck language has been used to analyze the ability of Recurrent Neural Networks (RNNs) to learn context-free grammars (CFGs).

An Environment for Relational Annotation of Political Debates

no code implementations ACL 2019 Andre Blessing, Nico Blokker, Sebastian Haunss, Jonas Kuhn, Gabriella Lapesa, Sebastian Pad{\'o}

This paper describes the MARDY corpus annotation environment developed for a collaboration between political science and computational linguistics.

BIG-bench Machine Learning Management

The (Non-)Utility of Structural Features in BiLSTM-based Dependency Parsers

no code implementations ACL 2019 Agnieszka Falenska, Jonas Kuhn

Classical non-neural dependency parsers put considerable effort on the design of feature functions.

Approximate Dynamic Oracle for Dependency Parsing with Reinforcement Learning

no code implementations WS 2018 Xiang Yu, Ngoc Thang Vu, Jonas Kuhn

We present a general approach with reinforcement learning (RL) to approximate dynamic oracles for transition systems where exact dynamic oracles are difficult to derive.

Dependency Parsing Imitation Learning +4

Bridging resolution: Task definition, corpus resources and rule-based experiments

no code implementations COLING 2018 Ina Roesiger, Arndt Riester, Jonas Kuhn

Recent work on bridging resolution has so far been based on the corpus ISNotes (Markert et al. 2012), as this was the only corpus available with unrestricted bridging annotation.

Coreference Resolution

Supervised Rhyme Detection with Siamese Recurrent Networks

no code implementations COLING 2018 Thomas Haider, Jonas Kuhn

We present the first supervised approach to rhyme detection with Siamese Recurrent Networks (SRN) that offer near perfect performance (97{\%} accuracy) with a single model on rhyme pairs for German, English and French, allowing future large scale analyses.

Binary Classification General Classification

Polyglot Semantic Parsing in APIs

2 code implementations NAACL 2018 Kyle Richardson, Jonathan Berant, Jonas Kuhn

Traditional approaches to semantic parsing (SP) work by training individual models for each available parallel dataset of text-meaning pairs.

Semantic Parsing Translation

The Code2Text Challenge: Text Generation in Source Libraries

no code implementations WS 2017 Kyle Richardson, Sina Zarrie{\ss}, Jonas Kuhn

We propose a new shared task for tactical data-to-text generation in the domain of source code libraries.

Data-to-Text Generation

The Code2Text Challenge: Text Generation in Source Code Libraries

1 code implementation31 Jul 2017 Kyle Richardson, Sina Zarrieß, Jonas Kuhn

We propose a new shared task for tactical data-to-text generation in the domain of source code libraries.

Data-to-Text Generation

Function Assistant: A Tool for NL Querying of APIs

2 code implementations EMNLP 2017 Kyle Richardson, Jonas Kuhn

For a given text query and background API, the tool finds candidate functions by performing a translation from the text to known representations in the API using the semantic parsing approach of Richardson and Kuhn (2017).

Natural Language Queries Semantic Parsing +1

Learning Semantic Correspondences in Technical Documentation

no code implementations ACL 2017 Kyle Richardson, Jonas Kuhn

We consider the problem of translating high-level textual descriptions to formal representations in technical documentation as part of an effort to model the meaning of such documentation.

Semantic Parsing

Flexible and Reliable Text Analytics in the Digital Humanities -- Some Methodological Considerations

no code implementations WS 2016 Jonas Kuhn

I start this talk by sketching some sample scenarios of Digital Humanities projects which involve various Humanities and Social Science disciplines, noting that the potential for a meaningful contribution to higher-level questions is highest when the employed language technological models are carefully tailored both (a) to characteristics of the given target corpus, and (b) to relevant analytical subtasks feeding the discipline-specific research questions.

Cultural Vocal Bursts Intensity Prediction

Named Entity Disambiguation for little known referents: a topic-based approach

no code implementations COLING 2016 Andrea Glaser, Jonas Kuhn

We propose an approach to Named Entity Disambiguation that avoids a problem of standard work on the task (likewise affecting fully supervised, weakly supervised, or distantly supervised machine learning techniques): the treatment of name mentions referring to people with no (or very little) coverage in the textual training data is systematically incorrect.

Entity Disambiguation Entity Linking +1

IMS HotCoref DE: A Data-driven Co-reference Resolver for German

no code implementations LREC 2016 Ina Roesiger, Jonas Kuhn

This paper presents a data-driven co-reference resolution system for German that has been adapted from IMS HotCoref, a co-reference resolver for English.

Learning from Within? Comparing PoS Tagging Approaches for Historical Text

no code implementations LREC 2016 Sarah Schulz, Jonas Kuhn

In this paper, we investigate unsupervised and semi-supervised methods for part-of-speech (PoS) tagging in the context of historical German text.

Part-Of-Speech Tagging POS +2

Learning to Make Inferences in a Semantic Parsing Task

no code implementations TACL 2016 Kyle Richardson, Jonas Kuhn

We introduce a new approach to training a semantic parser that uses textual entailment judgements as supervision.

Machine Translation Natural Language Inference +4

A Corpus of Comparisons in Product Reviews

no code implementations LREC 2014 Wiltrud Kessler, Jonas Kuhn

For each sentence we have annotated detailed information about the comparisons it contains: The comparative predicate that expresses the comparison, the type of the comparison, the two entities that are being compared, and the aspect they are compared in.

Opinion Mining Sentence +1

Converting an HPSG-based Treebank into its Parallel Dependency-based Treebank

no code implementations LREC 2014 Masood Ghayoomi, Jonas Kuhn

In this paper, we introduce an algorithm to convert an HPSG-based treebank into its parallel dependency-based treebank.

Textual Emigration Analysis (TEA)

no code implementations LREC 2014 Andre Blessing, Jonas Kuhn

We present a web-based application which is called TEA (Textual Emigration Analysis) as a showcase that applies textual analysis for the humanities.

A Corpus-based Study of the German Recipient Passive

no code implementations LREC 2012 Patrick Ziering, Sina Zarrie{\ss}, Jonas Kuhn

In this paper, we investigate the usage of a non-canonical German passive alternation for ditransitive verbs, the recipient passive, in naturally occuring corpus data.

Making Ellipses Explicit in Dependency Conversion for a German Treebank

no code implementations LREC 2012 Wolfgang Seeker, Jonas Kuhn

We present a carefully designed dependency conversion of the German phrase-structure treebank TiGer that explicitly represents verb ellipses by introducing empty nodes into the tree.

Cannot find the paper you are looking for? You can Submit a new open access paper.