Search Results for author: Tharindu Ranasinghe

Found 55 papers, 19 papers with code

A Federated Learning Approach to Privacy Preserving Offensive Language Identification

no code implementations17 Apr 2024 Marcos Zampieri, Damith Premasiri, Tharindu Ranasinghe

Since most social media data originates from end users, we propose a privacy preserving decentralized architecture for identifying offensive language online by introducing Federated Learning (FL) in the context of offensive language identification.

Federated Learning Language Identification +1

CSEPrompts: A Benchmark of Introductory Computer Science Prompts

no code implementations3 Apr 2024 Nishat Raihan, Dhiman Goswami, Sadiya Sayara Chowdhury Puspo, Christian Newman, Tharindu Ranasinghe, Marcos Zampieri

Recent advances in AI, machine learning, and NLP have led to the development of a new generation of Large Language Models (LLMs) that are trained on massive amounts of data and often have trillions of parameters.

Multiple-choice

DORE: A Dataset For Portuguese Definition Generation

no code implementations26 Mar 2024 Anna Beatriz Dimas Furtado, Tharindu Ranasinghe, Frédéric Blain, Ruslan Mitkov

In this research, we fill this gap by introducing DORE; the first dataset for Definition MOdelling for PoRtuguEse containing more than 100, 000 definitions.

Definition Modelling Text Generation

Guided Distant Supervision for Multilingual Relation Extraction Data: Adapting to a New Language

no code implementations25 Mar 2024 Alistair Plum, Tharindu Ranasinghe, Christoph Purschke

We also create a manually annotated dataset with 2000 instances to evaluate the models and release it together with the dataset compiled using guided distant supervision.

Relation Relation Extraction

A Text-to-Text Model for Multilingual Offensive Language Identification

no code implementations6 Dec 2023 Tharindu Ranasinghe, Marcos Zampieri

Following a similar approach, we also train the first multilingual pre-trained model for offensive language identification using mT5 and evaluate its performance on a set of six different languages (German, Hindi, Korean, Marathi, Sinhala, and Spanish).

Language Identification XLM-R

SurreyAI 2023 Submission for the Quality Estimation Shared Task

no code implementations1 Dec 2023 Archchana Sindhujan, Diptesh Kanojia, Constantin Orasan, Tharindu Ranasinghe

Quality Estimation (QE) systems are important in situations where it is necessary to assess the quality of translations, but there is no reference available.

Sentence

Can Model Fusing Help Transformers in Long Document Classification? An Empirical Study

1 code implementation18 Jul 2023 Damith Premasiri, Tharindu Ranasinghe, Ruslan Mitkov

Text classification is an area of research which has been studied over the years in Natural Language Processing (NLP).

Document Classification text-classification

Deep Learning Approaches to Lexical Simplification: A Survey

no code implementations19 May 2023 Kai North, Tharindu Ranasinghe, Matthew Shardlow, Marcos Zampieri

To reflect these recent advances, we present a comprehensive survey of papers published between 2017 and 2023 on LS and its sub-tasks with a special focus on deep learning.

Lexical Simplification Sentence +1

SOLD: Sinhala Offensive Language Dataset

1 code implementation1 Dec 2022 Tharindu Ranasinghe, Isuri Anuradha, Damith Premasiri, Kanishka Silva, Hansi Hettiarachchi, Lasitha Uyangodage, Marcos Zampieri

SOLD is a manually annotated dataset containing 10, 000 posts from Twitter annotated as offensive and not offensive at both sentence-level and token-level, improving the explainability of the ML models.

Language Identification Sentence

Overview of the HASOC Subtrack at FIRE 2022: Offensive Language Identification in Marathi

no code implementations18 Nov 2022 Tharindu Ranasinghe, Kai North, Damith Premasiri, Marcos Zampieri

The widespread of offensive content online has become a reason for great concern in recent years, motivating researchers to develop robust systems capable of identifying such content automatically.

Language Identification

ALEXSIS-PT: A New Resource for Portuguese Lexical Simplification

no code implementations COLING 2022 Kai North, Marcos Zampieri, Tharindu Ranasinghe

To continue improving the performance of LS systems we introduce ALEXSIS-PT, a novel multi-candidate dataset for Brazilian Portuguese LS containing 9, 605 candidate substitutions for 387 complex words.

Lexical Simplification XLM-R

Transformer-based Detection of Multiword Expressions in Flower and Plant Names

no code implementations16 Sep 2022 Damith Premasiri, Amal Haddad Haddad, Tharindu Ranasinghe, Ruslan Mitkov

In this paper, we explore state-of-the-art neural transformers in the task of detecting MWEs in flower and plant names.

Machine Translation Translation

BERT(s) to Detect Multiword Expressions

no code implementations16 Aug 2022 Damith Premasiri, Tharindu Ranasinghe

Multiword expressions (MWEs) present groups of words in which the meaning of the whole is not derived from the meaning of its parts.

Machine Translation Translation

DTW at Qur'an QA 2022: Utilising Transfer Learning with Transformers for Question Answering in a Low-resource Domain

1 code implementation12 May 2022 Damith Premasiri, Tharindu Ranasinghe, Wajdi Zaghouani, Ruslan Mitkov

The goal of the Qur'an QA 2022 shared task is to fill this gap by producing state-of-the-art question answering and reading comprehension research on Qur'an.

Ensemble Learning Machine Reading Comprehension +3

Biographical: A Semi-Supervised Relation Extraction Dataset

no code implementations2 May 2022 Alistair Plum, Tharindu Ranasinghe, Spencer Jones, Constantin Orasan, Ruslan Mitkov

The dataset, which is aimed towards digital humanities (DH) and historical research, is automatically compiled by aligning sentences from Wikipedia articles with matching structured data from sources including Pantheon and Wikidata.

Knowledge Graphs named-entity-recognition +6

Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation

1 code implementation WMT (EMNLP) 2021 Diptesh Kanojia, Marina Fomicheva, Tharindu Ranasinghe, Frédéric Blain, Constantin Orăsan, Lucia Specia

However, this ability is yet to be tested in the current evaluation practices, where QE systems are assessed only in terms of their correlation with human judgements.

Machine Translation Translation

FBERT: A Neural Transformer for Identifying Offensive Content

no code implementations Findings (EMNLP) 2021 Diptanu Sarkar, Marcos Zampieri, Tharindu Ranasinghe, Alexander Ororbia

Transformer-based models such as BERT, XLNET, and XLM-R have achieved state-of-the-art performance across various NLP tasks including the identification of offensive language and hate speech, an important problem in social media.

Language Identification XLM-R

Multilingual Offensive Language Identification for Low-resource Languages

no code implementations12 May 2021 Tharindu Ranasinghe, Marcos Zampieri

We report results of 0. 8415 F1 macro for Bengali in TRAC-2 shared task, 0. 8532 F1 macro for Danish and 0. 8701 F1 macro for Greek in OffensEval 2020, 0. 8568 F1 macro for Hindi in HASOC 2019 shared task and 0. 7513 F1 macro for Spanish in in SemEval-2019 Task 5 (HatEval) showing that our approach compares favourably to the best systems submitted to recent shared tasks on these three languages.

Language Identification Transfer Learning +1

Transformers to Fight the COVID-19 Infodemic

1 code implementation NAACL (NLP4IF) 2021 Lasitha Uyangodage, Tharindu Ranasinghe, Hansi Hettiarachchi

NLP4IF-2021 shared task on fighting the COVID-19 infodemic has been organised to strengthen the research in false information detection where the participants are asked to predict seven different binary labels regarding false information in a tweet.

WLV-RIT at SemEval-2021 Task 5: A Neural Transformer Framework for Detecting Toxic Spans

1 code implementation SEMEVAL 2021 Tharindu Ranasinghe, Diptanu Sarkar, Marcos Zampieri, Alexander Ororbia

In recent years, the widespread use of social media has led to an increase in the generation of toxic and offensive content on online platforms.

Toxic Spans Detection

TransWiC at SemEval-2021 Task 2: Transformer-based Multilingual and Cross-lingual Word-in-Context Disambiguation

no code implementations SEMEVAL 2021 Hansi Hettiarachchi, Tharindu Ranasinghe

Identifying whether a word carries the same meaning or different meaning in two contexts is an important research area in natural language processing which plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction.

Information Retrieval Question Answering +2

Comparing Approaches to Dravidian Language Identification

no code implementations EACL (VarDial) 2021 Tommi Jauhiainen, Tharindu Ranasinghe, Marcos Zampieri

This paper describes the submissions by team HWR to the Dravidian Language Identification (DLI) shared task organized at VarDial 2021 workshop.

Dialect Identification text-classification +1

MUDES: Multilingual Detection of Offensive Spans

1 code implementation NAACL 2021 Tharindu Ranasinghe, Marcos Zampieri

The interest in offensive content identification in social media has grown substantially in recent years.

BRUMS at SemEval-2020 Task 12: Transformer Based Multilingual Offensive Language Identification in Social Media

no code implementations SEMEVAL 2020 Tharindu Ranasinghe, Hansi Hettiarachchi

In this paper, we describe the team \textit{BRUMS} entry to OffensEval 2: Multilingual Offensive Language Identification in Social Media in SemEval-2020.

Language Identification

WLV-RIT at HASOC-Dravidian-CodeMix-FIRE2020: Offensive Language Identification in Code-switched YouTube Comments

no code implementations1 Nov 2020 Tharindu Ranasinghe, Sarthak Gupte, Marcos Zampieri, Ifeoma Nwogu

This paper describes the WLV-RIT entry to the Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC) shared task 2020.

Language Identification Transfer Learning +1

TransQuest: Translation Quality Estimation with Cross-lingual Transformers

1 code implementation COLING 2020 Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov

Recent years have seen big advances in the field of sentence-level quality estimation (QE), largely as a result of using neural-based architectures.

Sentence Transfer Learning +1

BRUMS at SemEval-2020 Task 12 : Transformer based Multilingual Offensive Language Identification in Social Media

no code implementations13 Oct 2020 Tharindu Ranasinghe, Hansi Hettiarachchi

In this paper, we describe the team \textit{BRUMS} entry to OffensEval 2: Multilingual Offensive Language Identification in Social Media in SemEval-2020.

Language Identification

InfoMiner at WNUT-2020 Task 2: Transformer-based Covid-19 Informative Tweet Extraction

1 code implementation EMNLP (WNUT) 2020 Hansi Hettiarachchi, Tharindu Ranasinghe

Identifying informative tweets is an important step when building information extraction systems based on social media.

Task 2

Multilingual Offensive Language Identification with Cross-lingual Embeddings

1 code implementation EMNLP 2020 Tharindu Ranasinghe, Marcos Zampieri

In this paper, we take advantage of English data available by applying cross-lingual contextual word embeddings and transfer learning to make predictions in languages with less resources.

Language Identification Transfer Learning +1

TransQuest at WMT2020: Sentence-Level Direct Assessment

1 code implementation WMT (EMNLP) 2020 Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov

This paper presents the team TransQuest's participation in Sentence-Level Direct Assessment shared task in WMT 2020.

Data Augmentation Sentence

Intelligent Translation Memory Matching and Retrieval with Sentence Encoders

no code implementations EAMT 2020 Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov

Matching and retrieving previously translated segments from a Translation Memory is the key functionality in Translation Memories systems.

Retrieval Sentence +1

Offensive Language Identification in Greek

1 code implementation LREC 2020 Zeses Pitenis, Marcos Zampieri, Tharindu Ranasinghe

As offensive language has become a rising issue for online communities and social media platforms, researchers have been investigating ways of coping with abusive content and developing systems to detect its different types: cyberbullying, hate speech, aggression, etc.

Language Identification

Toponym Detection in the Bio-Medical Domain: A Hybrid Approach with Deep Learning

no code implementations RANLP 2019 Alistair Plum, Tharindu Ranasinghe, Constantin Orasan

This paper compares how different machine learning classifiers can be used together with simple string matching and named entity recognition to detect locations in texts.

BIG-bench Machine Learning named-entity-recognition +4

Emoji Powered Capsule Network to Detect Type and Target of Offensive Posts in Social Media

no code implementations RANLP 2019 Hansi Hettiarachchi, Tharindu Ranasinghe

This paper describes a novel research approach to detect type and target of offensive posts in social media using a capsule network.

Semantic Textual Similarity with Siamese Neural Networks

no code implementations RANLP 2019 Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov

Calculating the Semantic Textual Similarity (STS) is an important research area in natural language processing which plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction.

Information Retrieval Question Answering +3

Enhancing Unsupervised Sentence Similarity Methods with Deep Contextualised Word Representations

no code implementations RANLP 2019 Tharindu Ranasinghe, Constantin Orasan, Ruslan Mitkov

Calculating Semantic Textual Similarity (STS) plays a significant role in many applications such as question answering, document summarisation, information retrieval and information extraction.

Contextualised Word Representations Information Retrieval +6

Cannot find the paper you are looking for? You can Submit a new open access paper.