no code implementations • 3 Apr 2024 • Shijia Zhou, Huangyan Shan, Barbara Plank, Robert Litschko
This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness (STR), on Track C: Cross-lingual.
no code implementations • 3 Nov 2023 • Gretel Liz De la Peña Sarracén, Paolo Rosso, Robert Litschko, Goran Glavaš, Simone Paolo Ponzetto
In this work, we resort to data augmentation and continual pre-training for domain adaptation to improve cross-lingual abusive language detection.
no code implementations • 9 Oct 2023 • Robert Litschko, Max Müller-Eberstein, Rob van der Goot, Leon Weber, Barbara Plank
Language understanding is a multi-faceted cognitive capability, which the Natural Language Processing (NLP) community has striven to model computationally for decades.
1 code implementation • 4 Sep 2023 • Leon Weber-Genzel, Robert Litschko, Ekaterina Artemova, Barbara Plank
Our results show that the choice of the right AED method and model size is indeed crucial and derive practical recommendations for how to use AED methods to clean instruction-tuning data.
1 code implementation • 11 May 2023 • Onur Galoğlu, Robert Litschko, Goran Glavaš
While a large body of work leveraged MMTs to mine parallel data and induce bilingual document embeddings, much less effort has been devoted to training general-purpose (massively) multilingual document encoder that can be used for both supervised and unsupervised document-level tasks.
1 code implementation • 9 May 2023 • Robert Litschko, Ekaterina Artemova, Barbara Plank
Transferring information retrieval (IR) models from a high-resource language (typically English) to other languages in a zero-shot fashion has become a widely adopted approach.
3 code implementations • NAACL (MIA) 2022 • Chia-Chien Hung, Tommaso Green, Robert Litschko, Tornike Tsereteli, Sotaro Takeshita, Marco Bombieri, Goran Glavaš, Simone Paolo Ponzetto
This paper introduces our proposed system for the MIA Shared Task on Cross-lingual Open-retrieval Question Answering (COQA).
1 code implementation • COLING 2022 • Robert Litschko, Ivan Vulić, Goran Glavaš
Current approaches therefore commonly transfer rankers trained on English data to other languages and cross-lingual setups by means of multilingual encoders: they fine-tune all parameters of pretrained massively multilingual Transformers (MMTs, e. g., multilingual BERT) on English relevance judgments, and then deploy them in the target language(s).
1 code implementation • 21 Dec 2021 • Robert Litschko, Ivan Vulić, Simone Paolo Ponzetto, Goran Glavaš
In this work we present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks across a number of diverse language pairs.
1 code implementation • 21 Jan 2021 • Robert Litschko, Ivan Vulić, Simone Paolo Ponzetto, Goran Glavaš
Therefore, in this work we present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks across a large number of language pairs.
no code implementations • EMNLP 2020 • Ivan Vulić, Edoardo Maria Ponti, Robert Litschko, Goran Glavaš, Anna Korhonen
The success of large pretrained language models (LMs) such as BERT and RoBERTa has sparked interest in probing their representations, in order to unveil what types of knowledge they implicitly capture.
no code implementations • COLING 2020 • Robert Litschko, Ivan Vulić, Željko Agić, Goran Glavaš
Current methods of cross-lingual parser transfer focus on predicting the best parser for a low-resource target language globally, that is, "at treebank level".
1 code implementation • ACL 2019 • Goran Glavas, Robert Litschko, Sebastian Ruder, Ivan Vulic
In this work, we make the first step towards a comprehensive evaluation of cross-lingual word embeddings.
1 code implementation • 2 May 2018 • Robert Litschko, Goran Glavaš, Simone Paolo Ponzetto, Ivan Vulić
We propose a fully unsupervised framework for ad-hoc cross-lingual information retrieval (CLIR) which requires no bilingual data at all.