no code implementations • EMNLP (LaTeCHCLfL, CLFL, LaTeCH) 2021 • Jörg Wöckener, Thomas Haider, Tristan Miller, The-Khang Nguyen, Thanh Tung Linh Nguyen, Minh Vu Pham, Jonas Belouadi, Steffen Eger
In this work, we design an end-to-end model for poetry generation based on conditioned recurrent neural network (RNN) language models whose goal is to learn stylistic features (poem length, sentiment, alliteration, and rhyming) from examples alone.
no code implementations • EMNLP (CODI) 2020 • Haixia Chai, Wei Zhao, Steffen Eger, Michael Strube
A substantial overlap of coreferent mentions in the CoNLL dataset magnifies the recent progress on coreference resolution.
no code implementations • INLG (ACL) 2021 • Christian Richter, Yanran Chen, Steffen Eger
This paper describes our contribution to the Shared Task ReproGen by Belz et al. (2021), which investigates the reproducibility of human evaluations in the context of Natural Language Generation.
no code implementations • WMT (EMNLP) 2021 • Gregor Geigle, Jonas Stadtmüller, Wei Zhao, Jonas Pfeiffer, Steffen Eger
This paper presents our submissions to the WMT2021 Shared Task on Quality Estimation, Task 1 Sentence-Level Direct Assessment.
1 code implementation • 18 Feb 2024 • Yanran Chen, Wei Zhao, Anne Breitbarth, Manuel Stoeckel, Alexander Mehler, Steffen Eger
Even though we have evidence that recent parsers trained on modern treebanks are not heavily affected by data 'noise' such as spelling changes and OCR errors in our historic data, we find that results of syntactic language change are sensitive to the parsers involved, which is a caution against using a single parser for evaluating syntactic language change as done in previous work.
no code implementations • 7 Jan 2024 • Hoa Nguyen, Steffen Eger
Recently, it has been noted that there is a citation age bias in the Natural Language Processing (NLP) community, one of the currently fastest growing AI subfields, in that the mean age of the bibliography of NLP papers has become ever younger in the last few years, leading to `citation amnesia' in which older knowledge is increasingly forgotten.
1 code implementation • 9 Dec 2023 • Ran Zhang, Aida Kostikova, Christoph Leiter, Jonas Belouadi, Daniil Larionov, Yanran Chen, Vivian Fresen, Steffen Eger
Artificial Intelligence (AI) has witnessed rapid growth, especially in the subfields Natural Language Processing (NLP), Machine Learning (ML) and Computer Vision (CV).
1 code implementation • 30 Oct 2023 • Christoph Leiter, Juri Opitz, Daniel Deutsch, Yang Gao, Rotem Dror, Steffen Eger
Specifically, we propose a novel competition setting in which we select a list of allowed LLMs and disallow fine-tuning to ensure a focus on prompting.
1 code implementation • 30 Sep 2023 • Jonas Belouadi, Anne Lauscher, Steffen Eger
To address this, we propose the use of TikZ, a well-known abstract graphics language that can be compiled to vector graphics, as an intermediate representation of scientific figures.
1 code implementation • 31 Jul 2023 • Steffen Eger, Christoph Leiter, Jonas Belouadi, Ran Zhang, Aida Kostikova, Daniil Larionov, Yanran Chen, Vivian Fresen
In particular, we compile a list of the 40 most popular papers based on normalized citation counts from the first half of 2023.
1 code implementation • 22 Jun 2023 • Ran Zhang, Jihed Ouni, Steffen Eger
Additionally, we explore the potential of ChatGPT for CLCTS as a summarizer and an evaluator.
no code implementations • 22 Jun 2023 • Christoph Leiter, Piyawat Lertvittayakumjorn, Marina Fomicheva, Wei Zhao, Yang Gao, Steffen Eger
In this context, we also discuss the latest state-of-the-art approaches to explainable metrics based on generative models such as ChatGPT and GPT4.
no code implementations • 7 Jun 2023 • Gil Rocha, Henrique Lopes Cardoso, Jonas Belouadi, Steffen Eger
We demonstrate the impact of our approach on an Argument Mining downstream task, evaluated on different corpora, showing that language models can be trained to automatically fill in discourse markers across different corpora, improving the performance of a downstream model in some, but not all, cases.
1 code implementation • 24 May 2023 • Cleo Matzken, Steffen Eger, Ivan Habernal
Protecting privacy in contemporary NLP models is gaining in importance.
no code implementations • 2 May 2023 • Anya Belz, Craig Thomson, Ehud Reiter, Gavin Abercrombie, Jose M. Alonso-Moral, Mohammad Arvan, Anouck Braggaar, Mark Cieliebak, Elizabeth Clark, Kees Van Deemter, Tanvi Dinkar, Ondřej Dušek, Steffen Eger, Qixiang Fang, Mingqi Gao, Albert Gatt, Dimitra Gkatzia, Javier González-Corbelle, Dirk Hovy, Manuela Hürlimann, Takumi Ito, John D. Kelleher, Filip Klubicka, Emiel Krahmer, Huiyuan Lai, Chris van der Lee, Yiru Li, Saad Mahamood, Margot Mieskes, Emiel van Miltenburg, Pablo Mosteiro, Malvina Nissim, Natalie Parde, Ondřej Plátek, Verena Rieser, Jie Ruan, Joel Tetreault, Antonio Toral, Xiaojun Wan, Leo Wanner, Lewis Watson, Diyi Yang
We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible.
no code implementations • 20 Feb 2023 • Christoph Leiter, Ran Zhang, Yanran Chen, Jonas Belouadi, Daniil Larionov, Vivian Fresen, Steffen Eger
ChatGPT, a chatbot developed by OpenAI, has gained widespread popularity and media attention since its release in November 2022.
1 code implementation • 20 Dec 2022 • Yanran Chen, Steffen Eger
Our human evaluation suggests that our best end-to-end system performs similarly to human authors (but arguably slightly worse).
1 code implementation • 20 Dec 2022 • Christoph Leiter, Hoa Nguyen, Steffen Eger
We then combine this segment-level score with the original metric to obtain a better metric.
1 code implementation • 20 Dec 2022 • Jonas Belouadi, Steffen Eger
State-of-the-art poetry generation systems are often complex.
2 code implementations • 9 Oct 2022 • Dominik Beese, Ole Pütz, Steffen Eger
We measure support with women and migrants in German political debates over the last 155 years.
1 code implementation • 20 Sep 2022 • Daniil Larionov, Jens Grünwald, Christoph Leiter, Steffen Eger
In this work, we provide a comprehensive evaluation of efficiency for MT evaluation metrics.
1 code implementation • COLING 2022 • Doan Nam Long Vu, Nafise Sadat Moosavi, Steffen Eger
The evaluation of recent embedding-based evaluation metrics for text generation is primarily based on measuring their correlation with human evaluations on standard benchmarks.
1 code implementation • 15 Aug 2022 • Yanran Chen, Steffen Eger
Recently proposed BERT-based evaluation metrics for text generation perform well on standard benchmarks but are vulnerable to adversarial attacks, e. g., relating to information correctness.
1 code implementation • 30 Mar 2022 • Yanran Chen, Jonas Belouadi, Steffen Eger
We find that reproduction of claims and results often fails because of (i) heavy undocumented preprocessing involved in the metrics, (ii) missing code and (iii) reporting weaker results for the baseline metrics.
1 code implementation • 21 Mar 2022 • Christoph Leiter, Piyawat Lertvittayakumjorn, Marina Fomicheva, Wei Zhao, Yang Gao, Steffen Eger
We also provide a synthesizing overview over recent approaches for explainable machine translation metrics and discuss how they relate to those goals and properties.
2 code implementations • 28 Feb 2022 • Dominik Beese, Begüm Altunbaş, Görkem Güzeler, Steffen Eger
We annotate over 1. 5 k papers from NLP and ML to train a SciBERT-based model to automatically predict the stance of a paper based on its title and abstract.
1 code implementation • 21 Feb 2022 • Jonas Belouadi, Steffen Eger
We show that our fully unsupervised metrics are effective, i. e., they beat supervised competitors on 4 out of our 5 evaluation datasets.
no code implementations • 31 Jan 2022 • Wei Zhao, Steffen Eger
Multilingual representations pre-trained with monolingual data exhibit considerably unequal task performances across languages.
1 code implementation • 26 Jan 2022 • Wei Zhao, Michael Strube, Steffen Eger
Still, recent BERT-based evaluation metrics are weak in recognizing coherence, and thus are not reliable in a way to spot the discourse-level improvements of those text generation systems.
1 code implementation • ACL 2021 • Maxime Peyrard, Wei Zhao, Steffen Eger, Robert West
Evaluation in NLP is usually done by comparing the scores of competing systems independently averaged over a common set of test instances.
1 code implementation • EMNLP 2021 • Marvin Kaster, Wei Zhao, Steffen Eger
Evaluation metrics are a key ingredient for progress of text generation systems.
1 code implementation • EMNLP (Eval4NLP) 2021 • Marina Fomicheva, Piyawat Lertvittayakumjorn, Wei Zhao, Steffen Eger, Yang Gao
In this paper, we introduce the Eval4NLP-2021shared task on explainable quality estimation.
no code implementations • 29 Sep 2021 • Wei Zhao, Steffen Eger
In this work, we analyze the limitations according to which previous alignments become very resource-intensive, \emph{viz.,} (i) the inability to sufficiently leverage data and (ii) that alignments are not trained properly.
1 code implementation • 13 Aug 2021 • Tobias Walter, Celina Kirschner, Steffen Eger, Goran Glavaš, Anne Lauscher, Simone Paolo Ponzetto
We analyze bias in historical corpora as encoded in diachronic distributional semantic models by focusing on two specific forms of bias, namely a political (i. e., anti-communism) and racist (i. e., antisemitism) one.
1 code implementation • ACL 2021 • Alexandra Ils, Dan Liu, Daniela Grunow, Steffen Eger
We use these annotations to train a BERT model with multiple data augmentation strategies.
no code implementations • 22 Jun 2021 • Yang Li, Wei Zhao, Erik Cambria, Suhang Wang, Steffen Eger
Therefore, in this paper, we introduce a new capsule network with graph routing to learn both relationships, where capsules in each layer are treated as the nodes of a graph.
1 code implementation • Findings (ACL) 2021 • Yannik Keller, Jan Mackensen, Steffen Eger
Adversarial attacks expose important blind spots of deep learning systems.
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Steffen Eger, Yannik Benz
Adversarial attacks are label-preserving modifications to inputs of machine learning classifiers designed to fool machines but not humans.
1 code implementation • SEMEVAL 2020 • David Rother, Thomas Haider, Steffen Eger
Remarkably, with only 10 dimensional MBERT embeddings (reduced from the original size of 768), our submitted model performs best on subtask 1 for English and ranks third in subtask 2 for English.
no code implementations • COLING 2020 • Taraka Rama, Lisa Beinborn, Steffen Eger
We probe the layers in multilingual BERT (mBERT) for phylogenetic and geographic language signals across 100 languages and compute language distances based on the mBERT representations.
1 code implementation • COLING 2020 • Martin Kerscher, Steffen Eger
We introspect black-box sentence embeddings by conditionally generating from them with the objective to retrieve the underlying discrete sentence.
1 code implementation • 12 Oct 2020 • Steffen Eger, Yannik Benz
Adversarial attacks are label-preserving modifications to inputs of machine learning classifiers designed to fool machines but not humans.
1 code implementation • Joint Conference on Lexical and Computational Semantics 2021 • Wei Zhao, Steffen Eger, Johannes Bjerva, Isabelle Augenstein
Cross-lingual representations have the potential to make NLP techniques available to the vast majority of languages in the world.
1 code implementation • CONLL 2020 • Steffen Eger, Johannes Daxenberger, Iryna Gurevych
We then probe embeddings in a multilingual setup with design choices that lie in a 'stable region', as we identify for English, and find that results on English do not transfer to other languages.
1 code implementation • ACL 2020 • Yang Gao, Wei Zhao, Steffen Eger
Compared to the state-of-the-art unsupervised evaluation metrics, SUPERT correlates better with human ratings by 18-39%.
1 code implementation • ACL 2020 • Wei Zhao, Goran Glavaš, Maxime Peyrard, Yang Gao, Robert West, Steffen Eger
We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER.
1 code implementation • LREC 2020 • Thomas Haider, Steffen Eger, Evgeny Kim, Roman Klinger, Winfried Menninghaus
Thus, we conceptualize a set of aesthetic emotions that are predictive of aesthetic appreciation in the reader, and allow the annotation of multiple labels per line to capture mixed emotions within their context.
1 code implementation • WS 2019 • Thomas Haider, Steffen Eger
Due to its semantic succinctness and novelty of expression, poetry is a great test bed for semantic change analysis.
4 code implementations • IJCNLP 2019 • Wei Zhao, Maxime Peyrard, Fei Liu, Yang Gao, Christian M. Meyer, Steffen Eger
A robust evaluation metric has a profound impact on the development of text generation systems.
5 code implementations • ACL 2019 • Wei Zhao, Haiyun Peng, Steffen Eger, Erik Cambria, Min Yang
Obstacles hindering the development of capsule networks for challenging NLP applications include poor scalability to large output spaces and less reliable routing processes.
Ranked #1 on Text Classification on RCV1 (P@1 metric)
no code implementations • WS 2019 • Steffen Eger, Andreas Rücklé, Iryna Gurevych
Our motivation is to challenge the current evaluation of sentence embeddings and to provide an easy-to-access reference for future research.
1 code implementation • NAACL 2019 • Yang Gao, Steffen Eger, Ilia Kuznetsov, Iryna Gurevych, Yusuke Miyao
We then focus on the role of the rebuttal phase, and propose a novel task to predict after-rebuttal (i. e., final) scores from initial reviews and author responses.
1 code implementation • NAACL 2019 • Steffen Eger, Gözde Gül Şahin, Andreas Rücklé, Ji-Ung Lee, Claudia Schulz, Mohsen Mesgar, Krishnkant Swarnkar, Edwin Simpson, Iryna Gurevych
Visual modifications to text are often used to obfuscate offensive comments in social media (e. g., "! d10t") or as a writing style ("1337" in "leet speak"), among other scenarios.
1 code implementation • 7 Mar 2019 • Steffen Eger, Chao Li, Florian Netzer, Iryna Gurevych
By extrapolation, we predict that these topics will remain lead problems/approaches in their fields in the short- and mid-term.
1 code implementation • EMNLP 2018 • Steffen Eger, Paul Youssef, Iryna Gurevych
Activation functions play a crucial role in neural networks because they are the nonlinearities which have been attributed to the success story of deep learning.
1 code implementation • WS 2018 • Steffen Eger, Andreas R{\"u}ckl{\'e}, Iryna Gurevych
We consider unsupervised cross-lingual transfer on two tasks, viz., sentence-level argumentation mining and standard POS tagging.
1 code implementation • COLING 2018 • Erik-L{\^a}n Do Dinh, Steffen Eger, Iryna Gurevych
In this paper we investigate multi-task learning for related non-literal language phenomena.
no code implementations • COLING 2018 • Erik-L{\^a}n Do Dinh, Steffen Eger, Iryna Gurevych
In this paper, we tackle four different tasks of non-literal language classification: token and construction level metaphor detection, classification of idiomatic use of infinitive-verb compounds, and classification of non-literal particle verbs.
1 code implementation • COLING 2018 • Steffen Eger, Johannes Daxenberger, Christian Stab, Iryna Gurevych
Argumentation mining (AM) requires the identification of complex discourse structures and has lately been applied with success monolingually.
no code implementations • NAACL 2018 • Christian Stab, Johannes Daxenberger, Chris Stahlhut, Tristan Miller, Benjamin Schiller, Christopher Tauchmann, Steffen Eger, Iryna Gurevych
Argument mining is a core technology for enabling argument search in large corpora.
1 code implementation • NAACL 2018 • Claudia Schulz, Steffen Eger, Johannes Daxenberger, Tobias Kahse, Iryna Gurevych
We investigate whether and where multi-task learning (MTL) can improve performance on NLP problems related to argumentation mining (AM), in particular argument component identification.
1 code implementation • 4 Mar 2018 • Andreas Rücklé, Steffen Eger, Maxime Peyrard, Iryna Gurevych
Here, we generalize the concept of average word embeddings to power mean word embeddings.
1 code implementation • EMNLP 2017 • Johannes Daxenberger, Steffen Eger, Ivan Habernal, Christian Stab, Iryna Gurevych
Argument mining has become a popular research area in NLP.
2 code implementations • ACL 2017 • Steffen Eger, Johannes Daxenberger, Iryna Gurevych
Contrary to models that operate on the argument component level, we find that framing AM as dependency parsing leads to subpar performance results.
no code implementations • ACL 2016 • Steffen Eger, Alexander Mehler
We consider two graph models of semantic change.
1 code implementation • SEMEVAL 2017 • Steffen Eger, Erik-Lân Do Dinh, Ilia Kuznetsov, Masoud Kiaeeha, Iryna Gurevych
From these approaches, we created an ensemble of differently hyper-parameterized systems, achieving a micro-F1-score of 0. 63 on the test data.
no code implementations • COLING 2016 • Carsten Schnober, Steffen Eger, Erik-Lân Do Dinh, Iryna Gurevych
We analyze the performance of encoder-decoder neural models and compare them with well-known established methods.
no code implementations • COLING 2016 • Steffen Eger, Armin Hoenen, Alexander Mehler
We study the role of the second language in bilingual word embeddings in monolingual semantic evaluation tasks.
no code implementations • LREC 2016 • Steffen Eger, R{\"u}diger Gleim, Alex Mehler, er
This paper relates to the challenge of morphological tagging and lemmatization in morphologically rich languages by example of German and Latin.
no code implementations • 5 Jan 2016 • Tim vor der Brück, Steffen Eger, Alexander Mehler
Our evaluation shows that the power kernel produces F-scores that are comparable to the reference kernels, but is -- except for the linear kernel -- faster to compute.
no code implementations • 2 Nov 2015 • Steffen Eger
We provide a new asymptotic formula for the case $S=\{(s_1,\ldots, s_N) \:|\: 1\le s_i\le 2\}$.