no code implementations • ACL (CASE) 2021 • Salvatore Giorgi, Vanni Zavarella, Hristo Tanev, Nicolas Stefanovitch, Sy Hwang, Hansi Hettiarachchi, Tharindu Ranasinghe, Vivek Kalyan, Paul Tan, Shaun Tan, Martin Andrews, Tiancheng Hu, Niklas Stoehr, Francesco Ignazio Re, Daniel Vegh, Dennis Atzenhofer, Brenda Curtis, Ali Hürriyetoğlu
Evaluating the state-of-the-art event detection systems on determining spatio-temporal distribution of the events on the ground is performed unfrequently.
no code implementations • RANLP 2021 • Lasitha Uyangodage, Tharindu Ranasinghe, Hansi Hettiarachchi
False information detection has thus become a surging research topic in recent months.
1 code implementation • OSACT (LREC) 2022 • Damith Premasiri, Tharindu Ranasinghe, Wajdi Zaghouani, Ruslan Mitkov
The goal of the Qur’an QA 2022 shared task is to fill this gap by producing state-of-the-art question answering and reading comprehension research on Qur’an.
no code implementations • 17 Apr 2024 • Marcos Zampieri, Damith Premasiri, Tharindu Ranasinghe
Since most social media data originates from end users, we propose a privacy preserving decentralized architecture for identifying offensive language online by introducing Federated Learning (FL) in the context of offensive language identification.
no code implementations • 3 Apr 2024 • Nishat Raihan, Dhiman Goswami, Sadiya Sayara Chowdhury Puspo, Christian Newman, Tharindu Ranasinghe, Marcos Zampieri
Recent advances in AI, machine learning, and NLP have led to the development of a new generation of Large Language Models (LLMs) that are trained on massive amounts of data and often have trillions of parameters.
no code implementations • 26 Mar 2024 • Anna Beatriz Dimas Furtado, Tharindu Ranasinghe, Frédéric Blain, Ruslan Mitkov
In this research, we fill this gap by introducing DORE; the first dataset for Definition MOdelling for PoRtuguEse containing more than 100, 000 definitions.
no code implementations • 25 Mar 2024 • Alistair Plum, Tharindu Ranasinghe, Christoph Purschke
We also create a manually annotated dataset with 2000 instances to evaluate the models and release it together with the dataset compiled using guided distant supervision.
4 code implementations • 25 Mar 2024 • Hansi Hettiarachchi, Damith Premasiri, Lasitha Uyangodage, Tharindu Ranasinghe
NSINA is the largest news corpus for Sinhala, available up to date.
no code implementations • 22 Feb 2024 • Kai North, Tharindu Ranasinghe, Matthew Shardlow, Marcos Zampieri
We present MultiLS, the first LS framework that allows for the creation of a multi-task LS dataset.
no code implementations • 6 Dec 2023 • Tharindu Ranasinghe, Marcos Zampieri
Following a similar approach, we also train the first multilingual pre-trained model for offensive language identification using mT5 and evaluate its performance on a set of six different languages (German, Hindi, Korean, Marathi, Sinhala, and Spanish).
no code implementations • 1 Dec 2023 • Archchana Sindhujan, Diptesh Kanojia, Constantin Orasan, Tharindu Ranasinghe
Quality Estimation (QE) systems are important in situations where it is necessary to assess the quality of translations, but there is no reference available.
no code implementations • 25 Nov 2023 • Md Nishat Raihan, Umma Hani Tanmoy, Anika Binte Islam, Kai North, Tharindu Ranasinghe, Antonios Anastasopoulos, Marcos Zampieri
Identifying offensive content in social media is vital for creating safe online communities.
1 code implementation • 18 Jul 2023 • Damith Premasiri, Tharindu Ranasinghe, Ruslan Mitkov
Text classification is an area of research which has been studied over the years in Natural Language Processing (NLP).
no code implementations • 19 May 2023 • Kai North, Tharindu Ranasinghe, Matthew Shardlow, Marcos Zampieri
To reflect these recent advances, we present a comprehensive survey of papers published between 2017 and 2023 on LS and its sub-tasks with a special focus on deep learning.
no code implementations • 18 May 2023 • Amal Haddad Haddad, Damith Premasiri, Tharindu Ranasinghe, Ruslan Mitkov
The domain of Botany is rich with metaphorical terms.
2 code implementations • 29 Jan 2023 • Tharindu Cyril Weerasooriya, Sujan Dutta, Tharindu Ranasinghe, Marcos Zampieri, Christopher M. Homan, Ashiqur R. KhudaBukhsh
For (2), we introduce a first-of-its-kind dataset of vicarious offense.
1 code implementation • 1 Dec 2022 • Tharindu Ranasinghe, Isuri Anuradha, Damith Premasiri, Kanishka Silva, Hansi Hettiarachchi, Lasitha Uyangodage, Marcos Zampieri
SOLD is a manually annotated dataset containing 10, 000 posts from Twitter annotated as offensive and not offensive at both sentence-level and token-level, improving the explainability of the ML models.
1 code implementation • 22 Nov 2022 • Marcos Zampieri, Tharindu Ranasinghe, Mrinal Chaudhari, Saurabh Gaikwad, Prajwal Krishna, Mayuresh Nene, Shrunali Paygude
We introduce the Marathi Offensive Language Dataset v. 2. 0 or MOLD 2. 0 and present multiple experiments on this dataset.
no code implementations • 18 Nov 2022 • Tharindu Ranasinghe, Kai North, Damith Premasiri, Marcos Zampieri
The widespread of offensive content online has become a reason for great concern in recent years, motivating researchers to develop robust systems capable of identifying such content automatically.
no code implementations • COLING 2022 • Kai North, Marcos Zampieri, Tharindu Ranasinghe
To continue improving the performance of LS systems we introduce ALEXSIS-PT, a novel multi-candidate dataset for Brazilian Portuguese LS containing 9, 605 candidate substitutions for 387 complex words.
no code implementations • 16 Sep 2022 • Damith Premasiri, Amal Haddad Haddad, Tharindu Ranasinghe, Ruslan Mitkov
In this paper, we explore state-of-the-art neural transformers in the task of detecting MWEs in flower and plant names.
no code implementations • 16 Aug 2022 • Damith Premasiri, Tharindu Ranasinghe
Multiword expressions (MWEs) present groups of words in which the meaning of the whole is not derived from the meaning of its parts.
1 code implementation • 12 May 2022 • Damith Premasiri, Tharindu Ranasinghe, Wajdi Zaghouani, Ruslan Mitkov
The goal of the Qur'an QA 2022 shared task is to fill this gap by producing state-of-the-art question answering and reading comprehension research on Qur'an.
no code implementations • 2 May 2022 • Alistair Plum, Tharindu Ranasinghe, Spencer Jones, Constantin Orasan, Ruslan Mitkov
The dataset, which is aimed towards digital humanities (DH) and historical research, is automatically compiled by aligning sentences from Wikipedia articles with matching structured data from sources including Pantheon and Wikidata.
no code implementations • 17 Dec 2021 • Thomas Mandl, Sandip Modha, Gautam Kishore Shahi, Hiren Madhu, Shrey Satapara, Prasenjit Majumder, Johannes Schaefer, Tharindu Ranasinghe, Marcos Zampieri, Durgesh Nandini, Amit Kumar Jaiswal
This paper presents the HASOC subtrack for English, Hindi, and Marathi.
1 code implementation • WMT (EMNLP) 2021 • Diptesh Kanojia, Marina Fomicheva, Tharindu Ranasinghe, Frédéric Blain, Constantin Orăsan, Lucia Specia
However, this ability is yet to be tested in the current evaluation practices, where QE systems are assessed only in terms of their correlation with human judgements.
no code implementations • Findings (EMNLP) 2021 • Diptanu Sarkar, Marcos Zampieri, Tharindu Ranasinghe, Alexander Ororbia
Transformer-based models such as BERT, XLNET, and XLM-R have achieved state-of-the-art performance across various NLP tasks including the identification of offensive language and hate speech, an important problem in social media.
1 code implementation • RANLP 2021 • Saurabh Gaikwad, Tharindu Ranasinghe, Marcos Zampieri, Christopher M. Homan
The widespread presence of offensive language on social media motivated the development of systems capable of recognizing such content automatically.
no code implementations • GermEval 2021 • Skye Morgan, Tharindu Ranasinghe, Marcos Zampieri
This paper addresses the identification of toxic, engaging, and fact-claiming comments on social media.
1 code implementation • ACL 2021 • Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov
Most studies on word-level Quality Estimation (QE) of machine translation focus on language-specific models.
no code implementations • 12 May 2021 • Tharindu Ranasinghe, Marcos Zampieri
We report results of 0. 8415 F1 macro for Bengali in TRAC-2 shared task, 0. 8532 F1 macro for Danish and 0. 8701 F1 macro for Greek in OffensEval 2020, 0. 8568 F1 macro for Hindi in HASOC 2019 shared task and 0. 7513 F1 macro for Spanish in in SemEval-2019 Task 5 (HatEval) showing that our approach compares favourably to the best systems submitted to recent shared tasks on these three languages.
1 code implementation • NAACL (NLP4IF) 2021 • Lasitha Uyangodage, Tharindu Ranasinghe, Hansi Hettiarachchi
NLP4IF-2021 shared task on fighting the COVID-19 infodemic has been organised to strengthen the research in false information detection where the participants are asked to predict seven different binary labels regarding false information in a tweet.
1 code implementation • SEMEVAL 2021 • Tharindu Ranasinghe, Diptanu Sarkar, Marcos Zampieri, Alexander Ororbia
In recent years, the widespread use of social media has led to an increase in the generation of toxic and offensive content on online platforms.
no code implementations • SEMEVAL 2021 • Hansi Hettiarachchi, Tharindu Ranasinghe
Identifying whether a word carries the same meaning or different meaning in two contexts is an important research area in natural language processing which plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction.
no code implementations • EACL (VarDial) 2021 • Tommi Jauhiainen, Tharindu Ranasinghe, Marcos Zampieri
This paper describes the submissions by team HWR to the Dravidian Language Identification (DLI) shared task organized at VarDial 2021 workshop.
1 code implementation • NAACL 2021 • Tharindu Ranasinghe, Marcos Zampieri
The interest in offensive content identification in social media has grown substantially in recent years.
no code implementations • SEMEVAL 2020 • Tharindu Ranasinghe, Hansi Hettiarachchi
In this paper, we describe the team \textit{BRUMS} entry to OffensEval 2: Multilingual Offensive Language Identification in Social Media in SemEval-2020.
no code implementations • SEMEVAL 2020 • Tharindu Ranasinghe, Alistair Plum, Constantin Orasan, Ruslan Mitkov
This paper presents the RGCL team submission to SemEval 2020 Task 6: DeftEval, subtasks 1 and 2.
no code implementations • 1 Nov 2020 • Tharindu Ranasinghe, Sarthak Gupte, Marcos Zampieri, Ifeoma Nwogu
This paper describes the WLV-RIT entry to the Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC) shared task 2020.
1 code implementation • COLING 2020 • Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov
Recent years have seen big advances in the field of sentence-level quality estimation (QE), largely as a result of using neural-based architectures.
1 code implementation • SEMEVAL 2020 • Hansi Hettiarachchi, Tharindu Ranasinghe
This paper presents the team BRUMS submission to SemEval-2020 Task 3: Graded Word Similarity in Context.
no code implementations • 13 Oct 2020 • Tharindu Ranasinghe, Hansi Hettiarachchi
In this paper, we describe the team \textit{BRUMS} entry to OffensEval 2: Multilingual Offensive Language Identification in Social Media in SemEval-2020.
no code implementations • 13 Oct 2020 • Tharindu Ranasinghe, Alistair Plum, Constantin Orasan, Ruslan Mitkov
This paper presents the RGCL team submission to SemEval 2020 Task 6: DeftEval, subtasks 1 and 2.
1 code implementation • EMNLP (WNUT) 2020 • Hansi Hettiarachchi, Tharindu Ranasinghe
Identifying informative tweets is an important step when building information extraction systems based on social media.
1 code implementation • EMNLP 2020 • Tharindu Ranasinghe, Marcos Zampieri
In this paper, we take advantage of English data available by applying cross-lingual contextual word embeddings and transfer learning to make predictions in languages with less resources.
1 code implementation • WMT (EMNLP) 2020 • Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov
This paper presents the team TransQuest's participation in Sentence-Level Direct Assessment shared task in WMT 2020.
no code implementations • EAMT 2020 • Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov
Matching and retrieving previously translated segments from a Translation Memory is the key functionality in Translation Memories systems.
1 code implementation • LREC 2020 • Zeses Pitenis, Marcos Zampieri, Tharindu Ranasinghe
As offensive language has become a rising issue for online communities and social media platforms, researchers have been investigating ways of coping with abusive content and developing systems to detect its different types: cyberbullying, hate speech, aggression, etc.
no code implementations • RANLP 2019 • Alistair Plum, Tharindu Ranasinghe, Constantin Orasan
This paper compares how different machine learning classifiers can be used together with simple string matching and named entity recognition to detect locations in texts.
no code implementations • RANLP 2019 • Hansi Hettiarachchi, Tharindu Ranasinghe
This paper describes a novel research approach to detect type and target of offensive posts in social media using a capsule network.
no code implementations • RANLP 2019 • Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov
Calculating the Semantic Textual Similarity (STS) is an important research area in natural language processing which plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction.
no code implementations • RANLP 2019 • Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov
Calculating Semantic Textual Similarity (STS) plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction.
Contextualised Word Representations Information Retrieval +6
no code implementations • 12 Jun 2019 • Nadeesha Pathirana, Sandaru Seneviratne, Rangika Samarawickrama, Shane Wolff, Charith Chitraranjan, Uthayasanker Thayasivam, Tharindu Ranasinghe
Concept identification is a crucial step in understanding and building a knowledge base for any particular domain.
no code implementations • 2 Jun 2019 • Lasitha Uyangoda, Supunmali Ahangama, Tharindu Ranasinghe
A huge amount of user generated content related to movies is created with the popularization of web 2. 0.
no code implementations • SEMEVAL 2019 • Alistair Plum, Tharindu Ranasinghe, Pablo Calleja, Constantin Or{\u{a}}san, Ruslan Mitkov
This article describes the system submitted by the RGCL-WLV team to the SemEval 2019 Task 12: Toponym resolution in scientific papers.