Search Results for author: Marius Mosbach

Found 27 papers, 12 papers with code

Some steps towards the generation of diachronic WordNets

no code implementations WS (NoDaLiDa) 2019 Yuri Bizzoni, Marius Mosbach, Dietrich Klakow, Stefania Degaetano-Ortlieb

We apply hyperbolic embeddings to trace the dynamics of change of conceptual-semantic relationships in a large diachronic scientific corpus (200 years).

Discourse-based Argument Segmentation and Annotation

no code implementations ACL (ISA, IWCS) 2021 Ekaterina Saveleva, Volha Petukhova, Marius Mosbach, Dietrich Klakow

We tested the widely used Penn Discourse Tree Bank full parser (Lin et al., 2010) and the state-of-the-art neural network NeuralEDUSeg (Wang et al., 2018) and XLNet (Yang et al., 2019) models on the two-stage discourse segmentation and discourse relation recognition.

Discourse Segmentation Segmentation

incom.py 2.0 - Calculating Linguistic Distances and Asymmetries in Auditory Perception of Closely Related Languages

no code implementations RANLP 2021 Marius Mosbach, Irina Stenger, Tania Avgustinova, Bernd Möbius, Dietrich Klakow

We present an extended version of a tool developed for calculating linguistic distances and asymmetries in auditory perception of closely related languages.

regression

Graph-based Argument Quality Assessment

no code implementations RANLP 2021 Ekaterina Saveleva, Volha Petukhova, Marius Mosbach, Dietrich Klakow

The paper presents a novel discourse-based approach to argument quality assessment defined as a graph classification task, where the depth of reasoning (argumentation) is evident from the number and type of detected discourse units and relations between them.

Graph Classification

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

2 code implementations9 Apr 2024 Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Chapados, Siva Reddy

We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB).

Contrastive Learning

What explains the success of cross-modal fine-tuning with ORCA?

no code implementations20 Mar 2024 Paloma García-de-Herreros, Vagrant Gautam, Philipp Slusallek, Dietrich Klakow, Marius Mosbach

ORCA (Shen et al., 2023) is a recent technique for cross-modal fine-tuning, i. e., applying pre-trained transformer models to modalities beyond their training data.

The Impact of Demonstrations on Multilingual In-Context Learning: A Multidimensional Analysis

no code implementations20 Feb 2024 Miaoran Zhang, Vagrant Gautam, Mingyang Wang, Jesujoba O. Alabi, Xiaoyu Shen, Dietrich Klakow, Marius Mosbach

Compared to work on monolingual (English) in-context learning, multilingual in-context learning is under-explored, and we lack an in-depth understanding of the role of demonstrations in this context.

In-Context Learning

The Hidden Space of Transformer Language Adapters

no code implementations20 Feb 2024 Jesujoba O. Alabi, Marius Mosbach, Matan Eyal, Dietrich Klakow, Mor Geva

We analyze the operation of transformer language adapters, which are small modules trained on top of a frozen language model to adapt its predictions to new target languages.

Language Modelling

Large GPT-like Models are Bad Babies: A Closer Look at the Relationship between Linguistic Competence and Psycholinguistic Measures

no code implementations8 Nov 2023 Julius Steuer, Marius Mosbach, Dietrich Klakow

Research on the cognitive plausibility of language models (LMs) has so far mostly concentrated on modelling psycholinguistic response variables such as reading times, gaze durations and N400/P600 EEG signals, while mostly leaving out the dimension of what Mahowald et al. (2023) described as formal and functional linguistic competence, and developmental plausibility.

EEG

Weaker Than You Think: A Critical Look at Weakly Supervised Learning

1 code implementation27 May 2023 Dawei Zhu, Xiaoyu Shen, Marius Mosbach, Andreas Stephan, Dietrich Klakow

In this paper, we revisit the setup of these approaches and find that the benefits brought by these approaches are significantly overestimated.

Weakly-supervised Learning

Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and Evaluation

1 code implementation26 May 2023 Marius Mosbach, Tiago Pimentel, Shauli Ravfogel, Dietrich Klakow, Yanai Elazar

In this paper, we compare the generalization of few-shot fine-tuning and in-context learning to challenge datasets, while controlling for the models used, the number of examples, and the number of parameters, ranging from 125M to 30B.

Domain Generalization In-Context Learning

Fusing Sentence Embeddings Into LSTM-based Autoregressive Language Models

1 code implementation4 Aug 2022 Vilém Zouhar, Marius Mosbach, Dietrich Klakow

We present an LSTM-based autoregressive language model which uses prefix embeddings (from a pretrained masked language model) via fusion (e. g. concatenation) to obtain a richer context representation for language modelling.

Language Modelling Sentence +1

Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions

no code implementations28 Jul 2022 Yanai Elazar, Nora Kassner, Shauli Ravfogel, Amir Feder, Abhilasha Ravichander, Marius Mosbach, Yonatan Belinkov, Hinrich Schütze, Yoav Goldberg

Our causal framework and our results demonstrate the importance of studying datasets and the benefits of causality for understanding NLP models.

StereoKG: Data-Driven Knowledge Graph Construction for Cultural Knowledge and Stereotypes

1 code implementation NAACL (WOAH) 2022 Awantee Deshpande, Dana Ruiter, Marius Mosbach, Dietrich Klakow

Analyzing ethnic or religious bias is important for improving fairness, accountability, and transparency of natural language processing models.

Fairness graph construction +2

Adapting Pre-trained Language Models to African Languages via Multilingual Adaptive Fine-Tuning

1 code implementation COLING 2022 Jesujoba O. Alabi, David Ifeoluwa Adelani, Marius Mosbach, Dietrich Klakow

Multilingual pre-trained language models (PLMs) have demonstrated impressive performance on several downstream tasks for both high-resourced and low-resourced languages.

NER Sentiment Analysis +5

Do Acoustic Word Embeddings Capture Phonological Similarity? An Empirical Study

1 code implementation16 Jun 2021 Badr M. Abdullah, Marius Mosbach, Iuliia Zaitova, Bernd Möbius, Dietrich Klakow

Our experiments show that (1) the distance in the embedding space in the best cases only moderately correlates with phonological distance, and (2) improving the performance on the word discrimination task does not necessarily yield models that better reflect word phonological similarity.

Word Embeddings

A Closer Look at Linguistic Knowledge in Masked Language Models: The Case of Relative Clauses in American English

1 code implementation COLING 2020 Marius Mosbach, Stefania Degaetano-Ortlieb, Marie-Pauline Krielke, Badr M. Abdullah, Dietrich Klakow

Transformer-based language models achieve high performance on various tasks, but we still lack understanding of the kind of linguistic knowledge they learn and rely on.

Sentence

Fusion Models for Improved Visual Captioning

no code implementations28 Oct 2020 Marimuthu Kalimuthu, Aditya Mogadala, Marius Mosbach, Dietrich Klakow

Building on these recent developments, and with the aim of improving the quality of generated captions, the contribution of our work in this paper is two-fold: First, we propose a generic multimodal model fusion framework for caption generation as well as emendation where we utilize different fusion strategies to integrate a pretrained Auxiliary Language Model (AuxLM) within the traditional encoder-decoder visual captioning frameworks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers

no code implementations EMNLP (BlackboxNLP) 2020 Marius Mosbach, Anna Khokhlova, Michael A. Hedderich, Dietrich Klakow

Our analysis reveals that while fine-tuning indeed changes the representations of a pre-trained model and these changes are typically larger for higher layers, only in very few cases, fine-tuning has a positive effect on probing accuracy that is larger than just using the pre-trained model with a strong pooling method.

Sentence

Sparse Graph to Sequence Learning for Vision Conditioned Long Textual Sequence Generation

no code implementations12 Jul 2020 Aditya Mogadala, Marius Mosbach, Dietrich Klakow

Generating longer textual sequences when conditioned on the visual information is an interesting problem to explore.

Graph-to-Sequence Sentence +1

On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines

2 code implementations ICLR 2021 Marius Mosbach, Maksym Andriushchenko, Dietrich Klakow

Fine-tuning pre-trained transformer-based language models such as BERT has become a common practice dominating leaderboards across various NLP benchmarks.

Misconceptions

On the security relevance of weights in deep learning

no code implementations8 Feb 2019 Kathrin Grosse, Thomas A. Trost, Marius Mosbach, Michael Backes, Dietrich Klakow

Recently, a weight-based attack on stochastic gradient descent inducing overfitting has been proposed.

Logit Pairing Methods Can Fool Gradient-Based Attacks

1 code implementation29 Oct 2018 Marius Mosbach, Maksym Andriushchenko, Thomas Trost, Matthias Hein, Dietrich Klakow

Recently, Kannan et al. [2018] proposed several logit regularization methods to improve the adversarial robustness of classifiers.

Adversarial Robustness

Cannot find the paper you are looking for? You can Submit a new open access paper.