Search Results for author: Benjamin Minixhofer

Found 5 papers, 5 papers with code

Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation

1 code implementation • 30 May 2023 • Benjamin Minixhofer, Jonas Pfeiffer, Ivan Vulić

Many NLP pipelines split text into sentences as one of the crucial preprocessing steps.

Machine Translation Segmentation +2

495

Paper
Code

CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models

1 code implementation • 23 May 2023 • Benjamin Minixhofer, Jonas Pfeiffer, Ivan Vulić

We first address the data gap by introducing a dataset of 255k compound and non-compound words across 56 diverse languages obtained from Wiktionary.

Paper
Code

HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crisis Response

1 code implementation • 10 Oct 2022 • Selim Fekih, Nicolò Tamagnone, Benjamin Minixhofer, Ranjan Shrestha, Ximena Contla, Ewan Oglethorpe, Navid Rekabsaz

Timely and effective response to humanitarian crises requires quick and accurate analysis of large amounts of text data - a process that can highly benefit from expert-assisted NLP systems trained on validated and annotated data in the humanitarian response domain.

Humanitarian Multilabel Text Classification +2

Paper
Code

WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models

1 code implementation • NAACL 2022 • Benjamin Minixhofer, Fabian Paischer, Navid Rekabsaz

Our method makes training large language models for new languages more accessible and less damaging to the environment.

Cross-Lingual Transfer Word Embeddings

Paper
Code

Enhancing Transformers with Gradient Boosted Decision Trees for NLI Fine-Tuning

1 code implementation • Findings (ACL) 2021 • Benjamin Minixhofer, Milan Gritta, Ignacio Iacobacci

For small Natural Language Inference (NLI) datasets, language modelling is typically followed by pretraining on a large (labelled) NLI dataset before fine-tuning with each NLI subtask.

Language Modelling Natural Language Inference +1

835

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.