1 code implementation • 30 May 2023 • Benjamin Minixhofer, Jonas Pfeiffer, Ivan Vulić
Many NLP pipelines split text into sentences as one of the crucial preprocessing steps.
1 code implementation • 23 May 2023 • Benjamin Minixhofer, Jonas Pfeiffer, Ivan Vulić
We first address the data gap by introducing a dataset of 255k compound and non-compound words across 56 diverse languages obtained from Wiktionary.
1 code implementation • 10 Oct 2022 • Selim Fekih, Nicolò Tamagnone, Benjamin Minixhofer, Ranjan Shrestha, Ximena Contla, Ewan Oglethorpe, Navid Rekabsaz
Timely and effective response to humanitarian crises requires quick and accurate analysis of large amounts of text data - a process that can highly benefit from expert-assisted NLP systems trained on validated and annotated data in the humanitarian response domain.
1 code implementation • NAACL 2022 • Benjamin Minixhofer, Fabian Paischer, Navid Rekabsaz
Our method makes training large language models for new languages more accessible and less damaging to the environment.
1 code implementation • Findings (ACL) 2021 • Benjamin Minixhofer, Milan Gritta, Ignacio Iacobacci
For small Natural Language Inference (NLI) datasets, language modelling is typically followed by pretraining on a large (labelled) NLI dataset before fine-tuning with each NLI subtask.