Search Results for author: Tharindu Ranasinghe

Found 55 papers, 19 papers with code

Discovering Black Lives Matter Events in the United States: Shared Task 3, CASE 2021

no code implementations • ACL (CASE) 2021 • Salvatore Giorgi, Vanni Zavarella, Hristo Tanev, Nicolas Stefanovitch, Sy Hwang, Hansi Hettiarachchi, Tharindu Ranasinghe, Vivek Kalyan, Paul Tan, Shaun Tan, Martin Andrews, Tiancheng Hu, Niklas Stoehr, Francesco Ignazio Re, Daniel Vegh, Dennis Atzenhofer, Brenda Curtis, Ali Hürriyetoğlu

Evaluating the state-of-the-art event detection systems on determining spatio-temporal distribution of the events on the ground is performed unfrequently.

Event Detection

Paper
Add Code

Can Multilingual Transformers Fight the COVID-19 Infodemic?

no code implementations • RANLP 2021 • Lasitha Uyangodage, Tharindu Ranasinghe, Hansi Hettiarachchi

False information detection has thus become a surging research topic in recent months.

BIG-bench Machine Learning

Paper
Add Code

DTW at Qur’an QA 2022: Utilising Transfer Learning with Transformers for Question Answering in a Low-resource Domain

1 code implementation • OSACT (LREC) 2022 • Damith Premasiri, Tharindu Ranasinghe, Wajdi Zaghouani, Ruslan Mitkov

The goal of the Qur’an QA 2022 shared task is to fill this gap by producing state-of-the-art question answering and reading comprehension research on Qur’an.

Ensemble Learning Machine Reading Comprehension +3

Paper
Code

A Federated Learning Approach to Privacy Preserving Offensive Language Identification

no code implementations • 17 Apr 2024 • Marcos Zampieri, Damith Premasiri, Tharindu Ranasinghe

Since most social media data originates from end users, we propose a privacy preserving decentralized architecture for identifying offensive language online by introducing Federated Learning (FL) in the context of offensive language identification.

Federated Learning Language Identification +1

Paper
Add Code

CSEPrompts: A Benchmark of Introductory Computer Science Prompts

no code implementations • 3 Apr 2024 • Nishat Raihan, Dhiman Goswami, Sadiya Sayara Chowdhury Puspo, Christian Newman, Tharindu Ranasinghe, Marcos Zampieri

Recent advances in AI, machine learning, and NLP have led to the development of a new generation of Large Language Models (LLMs) that are trained on massive amounts of data and often have trillions of parameters.

Multiple-choice

Paper
Add Code

DORE: A Dataset For Portuguese Definition Generation

no code implementations • 26 Mar 2024 • Anna Beatriz Dimas Furtado, Tharindu Ranasinghe, Frédéric Blain, Ruslan Mitkov

In this research, we fill this gap by introducing DORE; the first dataset for Definition MOdelling for PoRtuguEse containing more than 100, 000 definitions.

Definition Modelling Text Generation

Paper
Add Code

Guided Distant Supervision for Multilingual Relation Extraction Data: Adapting to a New Language

no code implementations • 25 Mar 2024 • Alistair Plum, Tharindu Ranasinghe, Christoph Purschke

We also create a manually annotated dataset with 2000 instances to evaluate the models and release it together with the dataset compiled using guided distant supervision.

Relation Relation Extraction

Paper
Add Code

NSINA: A News Corpus for Sinhala

4 code implementations • 25 Mar 2024 • Hansi Hettiarachchi, Damith Premasiri, Lasitha Uyangodage, Tharindu Ranasinghe

NSINA is the largest news corpus for Sinhala, available up to date.

Benchmarking Headline Generation

Paper
Code

MultiLS: A Multi-task Lexical Simplification Framework

no code implementations • 22 Feb 2024 • Kai North, Tharindu Ranasinghe, Matthew Shardlow, Marcos Zampieri

We present MultiLS, the first LS framework that allows for the creation of a multi-task LS dataset.

Lexical Complexity Prediction Lexical Simplification +1

Paper
Add Code

A Text-to-Text Model for Multilingual Offensive Language Identification

no code implementations • 6 Dec 2023 • Tharindu Ranasinghe, Marcos Zampieri

Following a similar approach, we also train the first multilingual pre-trained model for offensive language identification using mT5 and evaluate its performance on a set of six different languages (German, Hindi, Korean, Marathi, Sinhala, and Spanish).

Language Identification XLM-R

Paper
Add Code

SurreyAI 2023 Submission for the Quality Estimation Shared Task

no code implementations • 1 Dec 2023 • Archchana Sindhujan, Diptesh Kanojia, Constantin Orasan, Tharindu Ranasinghe

Quality Estimation (QE) systems are important in situations where it is necessary to assess the quality of translations, but there is no reference available.

Sentence

Paper
Add Code

Offensive Language Identification in Transliterated and Code-Mixed Bangla

no code implementations • 25 Nov 2023 • Md Nishat Raihan, Umma Hani Tanmoy, Anika Binte Islam, Kai North, Tharindu Ranasinghe, Antonios Anastasopoulos, Marcos Zampieri

Identifying offensive content in social media is vital for creating safe online communities.

Language Identification

Paper
Add Code

Can Model Fusing Help Transformers in Long Document Classification? An Empirical Study

1 code implementation • 18 Jul 2023 • Damith Premasiri, Tharindu Ranasinghe, Ruslan Mitkov

Text classification is an area of research which has been studied over the years in Natural Language Processing (NLP).

Document Classification text-classification

Paper
Code

Deep Learning Approaches to Lexical Simplification: A Survey

no code implementations • 19 May 2023 • Kai North, Tharindu Ranasinghe, Matthew Shardlow, Marcos Zampieri

To reflect these recent advances, we present a comprehensive survey of papers published between 2017 and 2023 on LS and its sub-tasks with a special focus on deep learning.

Lexical Simplification Sentence +1

Paper
Add Code

Deep Learning Methods for Extracting Metaphorical Names of Flowers and Plants

no code implementations • 18 May 2023 • Amal Haddad Haddad, Damith Premasiri, Tharindu Ranasinghe, Ruslan Mitkov

The domain of Botany is rich with metaphorical terms.

Machine Translation Translation

Paper
Add Code

Vicarious Offense and Noise Audit of Offensive Speech Classifiers: Unifying Human and Machine Disagreement on What is Offensive

2 code implementations • 29 Jan 2023 • Tharindu Cyril Weerasooriya, Sujan Dutta, Tharindu Ranasinghe, Marcos Zampieri, Christopher M. Homan, Ashiqur R. KhudaBukhsh

For (2), we introduce a first-of-its-kind dataset of vicarious offense.

Language Modelling Large Language Model +1

Paper
Code

SOLD: Sinhala Offensive Language Dataset

1 code implementation • 1 Dec 2022 • Tharindu Ranasinghe, Isuri Anuradha, Damith Premasiri, Kanishka Silva, Hansi Hettiarachchi, Lasitha Uyangodage, Marcos Zampieri

SOLD is a manually annotated dataset containing 10, 000 posts from Twitter annotated as offensive and not offensive at both sentence-level and token-level, improving the explainability of the ML models.

Language Identification Sentence

Paper
Code

Predicting the Type and Target of Offensive Social Media Posts in Marathi

1 code implementation • 22 Nov 2022 • Marcos Zampieri, Tharindu Ranasinghe, Mrinal Chaudhari, Saurabh Gaikwad, Prajwal Krishna, Mayuresh Nene, Shrunali Paygude

We introduce the Marathi Offensive Language Dataset v. 2. 0 or MOLD 2. 0 and present multiple experiments on this dataset.

Language Identification

Paper
Code

Overview of the HASOC Subtrack at FIRE 2022: Offensive Language Identification in Marathi

no code implementations • 18 Nov 2022 • Tharindu Ranasinghe, Kai North, Damith Premasiri, Marcos Zampieri

The widespread of offensive content online has become a reason for great concern in recent years, motivating researchers to develop robust systems capable of identifying such content automatically.

Language Identification

Paper
Add Code

ALEXSIS-PT: A New Resource for Portuguese Lexical Simplification

no code implementations • COLING 2022 • Kai North, Marcos Zampieri, Tharindu Ranasinghe

To continue improving the performance of LS systems we introduce ALEXSIS-PT, a novel multi-candidate dataset for Brazilian Portuguese LS containing 9, 605 candidate substitutions for 387 complex words.

Lexical Simplification XLM-R

Paper
Add Code

Transformer-based Detection of Multiword Expressions in Flower and Plant Names

no code implementations • 16 Sep 2022 • Damith Premasiri, Amal Haddad Haddad, Tharindu Ranasinghe, Ruslan Mitkov

In this paper, we explore state-of-the-art neural transformers in the task of detecting MWEs in flower and plant names.

Machine Translation Translation

Paper
Add Code

BERT(s) to Detect Multiword Expressions

no code implementations • 16 Aug 2022 • Damith Premasiri, Tharindu Ranasinghe

Multiword expressions (MWEs) present groups of words in which the meaning of the whole is not derived from the meaning of its parts.

Machine Translation Translation

Paper
Add Code

DTW at Qur'an QA 2022: Utilising Transfer Learning with Transformers for Question Answering in a Low-resource Domain

1 code implementation • 12 May 2022 • Damith Premasiri, Tharindu Ranasinghe, Wajdi Zaghouani, Ruslan Mitkov

The goal of the Qur'an QA 2022 shared task is to fill this gap by producing state-of-the-art question answering and reading comprehension research on Qur'an.

Ensemble Learning Machine Reading Comprehension +3

Paper
Code

Biographical: A Semi-Supervised Relation Extraction Dataset

no code implementations • 2 May 2022 • Alistair Plum, Tharindu Ranasinghe, Spencer Jones, Constantin Orasan, Ruslan Mitkov

The dataset, which is aimed towards digital humanities (DH) and historical research, is automatically compiled by aligning sentences from Wikipedia articles with matching structured data from sources including Pantheon and Wikidata.

Knowledge Graphs named-entity-recognition +6

Paper
Add Code

Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages

no code implementations • 17 Dec 2021 • Thomas Mandl, Sandip Modha, Gautam Kishore Shahi, Hiren Madhu, Shrey Satapara, Prasenjit Majumder, Johannes Schaefer, Tharindu Ranasinghe, Marcos Zampieri, Durgesh Nandini, Amit Kumar Jaiswal

This paper presents the HASOC subtrack for English, Hindi, and Marathi.

Binary Classification Classification

Paper
Add Code

Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation

1 code implementation • WMT (EMNLP) 2021 • Diptesh Kanojia, Marina Fomicheva, Tharindu Ranasinghe, Frédéric Blain, Constantin Orăsan, Lucia Specia

However, this ability is yet to be tested in the current evaluation practices, where QE systems are assessed only in terms of their correlation with human judgements.

Machine Translation Translation

Paper
Code

FBERT: A Neural Transformer for Identifying Offensive Content

no code implementations • Findings (EMNLP) 2021 • Diptanu Sarkar, Marcos Zampieri, Tharindu Ranasinghe, Alexander Ororbia

Transformer-based models such as BERT, XLNET, and XLM-R have achieved state-of-the-art performance across various NLP tasks including the identification of offensive language and hate speech, an important problem in social media.

Language Identification XLM-R

Paper
Add Code

Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi

1 code implementation • RANLP 2021 • Saurabh Gaikwad, Tharindu Ranasinghe, Marcos Zampieri, Christopher M. Homan

The widespread presence of offensive language on social media motivated the development of systems capable of recognizing such content automatically.

Language Identification Transfer Learning

Paper
Code

WLV-RIT at GermEval 2021: Multitask Learning with Transformers to Detect Toxic, Engaging, and Fact-Claiming Comments

no code implementations • GermEval 2021 • Skye Morgan, Tharindu Ranasinghe, Marcos Zampieri

This paper addresses the identification of toxic, engaging, and fact-claiming comments on social media.

Paper
Add Code

An Exploratory Analysis of Multilingual Word-Level Quality Estimation with Cross-Lingual Transformers

1 code implementation • ACL 2021 • Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov

Most studies on word-level Quality Estimation (QE) of machine translation focus on language-specific models.

Machine Translation Translation

102

Paper
Code

Multilingual Offensive Language Identification for Low-resource Languages

no code implementations • 12 May 2021 • Tharindu Ranasinghe, Marcos Zampieri

We report results of 0. 8415 F1 macro for Bengali in TRAC-2 shared task, 0. 8532 F1 macro for Danish and 0. 8701 F1 macro for Greek in OffensEval 2020, 0. 8568 F1 macro for Hindi in HASOC 2019 shared task and 0. 7513 F1 macro for Spanish in in SemEval-2019 Task 5 (HatEval) showing that our approach compares favourably to the best systems submitted to recent shared tasks on these three languages.

Language Identification Transfer Learning +1

Paper
Add Code

Transformers to Fight the COVID-19 Infodemic

1 code implementation • NAACL (NLP4IF) 2021 • Lasitha Uyangodage, Tharindu Ranasinghe, Hansi Hettiarachchi

NLP4IF-2021 shared task on fighting the COVID-19 infodemic has been organised to strengthen the research in false information detection where the participants are asked to predict seven different binary labels regarding false information in a tweet.

Paper
Code

WLV-RIT at SemEval-2021 Task 5: A Neural Transformer Framework for Detecting Toxic Spans

1 code implementation • SEMEVAL 2021 • Tharindu Ranasinghe, Diptanu Sarkar, Marcos Zampieri, Alexander Ororbia

In recent years, the widespread use of social media has led to an increase in the generation of toxic and offensive content on online platforms.

Toxic Spans Detection

Paper
Code

TransWiC at SemEval-2021 Task 2: Transformer-based Multilingual and Cross-lingual Word-in-Context Disambiguation

no code implementations • SEMEVAL 2021 • Hansi Hettiarachchi, Tharindu Ranasinghe

Identifying whether a word carries the same meaning or different meaning in two contexts is an important research area in natural language processing which plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction.

Information Retrieval Question Answering +2

Paper
Add Code

Comparing Approaches to Dravidian Language Identification

no code implementations • EACL (VarDial) 2021 • Tommi Jauhiainen, Tharindu Ranasinghe, Marcos Zampieri

This paper describes the submissions by team HWR to the Dravidian Language Identification (DLI) shared task organized at VarDial 2021 workshop.

Dialect Identification text-classification +1

Paper
Add Code

MUDES: Multilingual Detection of Offensive Spans

1 code implementation • NAACL 2021 • Tharindu Ranasinghe, Marcos Zampieri

The interest in offensive content identification in social media has grown substantially in recent years.

Paper
Code

BRUMS at SemEval-2020 Task 12: Transformer Based Multilingual Offensive Language Identification in Social Media

no code implementations • SEMEVAL 2020 • Tharindu Ranasinghe, Hansi Hettiarachchi

In this paper, we describe the team \textit{BRUMS} entry to OffensEval 2: Multilingual Offensive Language Identification in Social Media in SemEval-2020.

Language Identification

Paper
Add Code

RGCL at SemEval-2020 Task 6: Neural Approaches to DefinitionExtraction

no code implementations • SEMEVAL 2020 • Tharindu Ranasinghe, Alistair Plum, Constantin Orasan, Ruslan Mitkov

This paper presents the RGCL team submission to SemEval 2020 Task 6: DeftEval, subtasks 1 and 2.

Sentence

Paper
Add Code

WLV-RIT at HASOC-Dravidian-CodeMix-FIRE2020: Offensive Language Identification in Code-switched YouTube Comments

no code implementations • 1 Nov 2020 • Tharindu Ranasinghe, Sarthak Gupte, Marcos Zampieri, Ifeoma Nwogu

This paper describes the WLV-RIT entry to the Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC) shared task 2020.

Language Identification Transfer Learning +1

Paper
Add Code

TransQuest: Translation Quality Estimation with Cross-lingual Transformers

1 code implementation • COLING 2020 • Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov

Recent years have seen big advances in the field of sentence-level quality estimation (QE), largely as a result of using neural-based architectures.

Sentence Transfer Learning +1

102

Paper
Code

BRUMS at SemEval-2020 Task 3: Contextualised Embeddings for Predicting the (Graded) Effect of Context in Word Similarity

1 code implementation • SEMEVAL 2020 • Hansi Hettiarachchi, Tharindu Ranasinghe

This paper presents the team BRUMS submission to SemEval-2020 Task 3: Graded Word Similarity in Context.

Position Word Embeddings +1

Paper
Code

BRUMS at SemEval-2020 Task 12 : Transformer based Multilingual Offensive Language Identification in Social Media

no code implementations • 13 Oct 2020 • Tharindu Ranasinghe, Hansi Hettiarachchi

In this paper, we describe the team \textit{BRUMS} entry to OffensEval 2: Multilingual Offensive Language Identification in Social Media in SemEval-2020.

Language Identification

Paper
Add Code

RGCL at SemEval-2020 Task 6: Neural Approaches to Definition Extraction

no code implementations • 13 Oct 2020 • Tharindu Ranasinghe, Alistair Plum, Constantin Orasan, Ruslan Mitkov

This paper presents the RGCL team submission to SemEval 2020 Task 6: DeftEval, subtasks 1 and 2.

Definition Extraction Sentence

Paper
Add Code

InfoMiner at WNUT-2020 Task 2: Transformer-based Covid-19 Informative Tweet Extraction

1 code implementation • EMNLP (WNUT) 2020 • Hansi Hettiarachchi, Tharindu Ranasinghe

Identifying informative tweets is an important step when building information extraction systems based on social media.

Task 2

Paper
Code

Multilingual Offensive Language Identification with Cross-lingual Embeddings

1 code implementation • EMNLP 2020 • Tharindu Ranasinghe, Marcos Zampieri

In this paper, we take advantage of English data available by applying cross-lingual contextual word embeddings and transfer learning to make predictions in languages with less resources.

Language Identification Transfer Learning +1

Paper
Code

TransQuest at WMT2020: Sentence-Level Direct Assessment

1 code implementation • WMT (EMNLP) 2020 • Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov

This paper presents the team TransQuest's participation in Sentence-Level Direct Assessment shared task in WMT 2020.

Data Augmentation Sentence

102

Paper
Code

Intelligent Translation Memory Matching and Retrieval with Sentence Encoders

no code implementations • EAMT 2020 • Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov

Matching and retrieving previously translated segments from a Translation Memory is the key functionality in Translation Memories systems.

Retrieval Sentence +1

Paper
Add Code

Offensive Language Identification in Greek

1 code implementation • LREC 2020 • Zeses Pitenis, Marcos Zampieri, Tharindu Ranasinghe

As offensive language has become a rising issue for online communities and social media platforms, researchers have been investigating ways of coping with abusive content and developing systems to detect its different types: cyberbullying, hate speech, aggression, etc.

Language Identification

Paper
Code

Toponym Detection in the Bio-Medical Domain: A Hybrid Approach with Deep Learning

no code implementations • RANLP 2019 • Alistair Plum, Tharindu Ranasinghe, Constantin Orasan

This paper compares how different machine learning classifiers can be used together with simple string matching and named entity recognition to detect locations in texts.

BIG-bench Machine Learning named-entity-recognition +4

Paper
Add Code

Emoji Powered Capsule Network to Detect Type and Target of Offensive Posts in Social Media

no code implementations • RANLP 2019 • Hansi Hettiarachchi, Tharindu Ranasinghe

This paper describes a novel research approach to detect type and target of offensive posts in social media using a capsule network.

Paper
Add Code

Semantic Textual Similarity with Siamese Neural Networks

no code implementations • RANLP 2019 • Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov

Calculating the Semantic Textual Similarity (STS) is an important research area in natural language processing which plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction.

Information Retrieval Question Answering +3

Paper
Add Code

Enhancing Unsupervised Sentence Similarity Methods with Deep Contextualised Word Representations

no code implementations • RANLP 2019 • Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov

Calculating Semantic Textual Similarity (STS) plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction.

Contextualised Word Representations Information Retrieval +6

Paper
Add Code

Concept Discovery through Information Extraction in Restaurant Domain

no code implementations • 12 Jun 2019 • Nadeesha Pathirana, Sandaru Seneviratne, Rangika Samarawickrama, Shane Wolff, Charith Chitraranjan, Uthayasanker Thayasivam, Tharindu Ranasinghe

Concept identification is a crucial step in understanding and building a knowledge base for any particular domain.

Clustering General Classification

Paper
Add Code

User Profile Feature-Based Approach to Address the Cold Start Problem in Collaborative Filtering for Personalized Movie Recommendation

no code implementations • 2 Jun 2019 • Lasitha Uyangoda, Supunmali Ahangama, Tharindu Ranasinghe

A huge amount of user generated content related to movies is created with the popularization of web 2. 0.

Collaborative Filtering Movie Recommendation +1

Paper
Add Code

RGCL-WLV at SemEval-2019 Task 12: Toponym Detection

no code implementations • SEMEVAL 2019 • Alistair Plum, Tharindu Ranasinghe, Pablo Calleja, Constantin Or{\u{a}}san, Ruslan Mitkov

This article describes the system submitted by the RGCL-WLV team to the SemEval 2019 Task 12: Toponym resolution in scientific papers.

BIG-bench Machine Learning Toponym Resolution

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.