Search Results for author: Harish Tayyar Madabushi

Found 34 papers, 18 papers with code

UoB at ProfNER 2021: Data Augmentation for Classification Using Machine Translation

no code implementations • NAACL (SMM4H) 2021 • Frances Adriana Laureano De Leon, Harish Tayyar Madabushi, Mark Lee

This paper describes the participation of the UoB-NLP team in the ProfNER-ST shared subtask 7a.

Data Augmentation Machine Translation +1

Paper
Add Code

CXP949 at WNUT-2020 Task 2: Extracting Informative COVID-19 Tweets - RoBERTa Ensembles and The Continued Relevance of Handcrafted Features

no code implementations • EMNLP (WNUT) 2020 • Calum Perrio, Harish Tayyar Madabushi

This paper presents our submission to Task 2 of the Workshop on Noisy User-generated Text.

Language Modelling Task 2 +2

Paper
Add Code

CogNLP-Sheffield at CMCL 2021 Shared Task: Blending Cognitively Inspired Features with Transformer-based Language Models for Predicting Eye Tracking Patterns

no code implementations • NAACL (CMCL) 2021 • Peter Vickers, Rosa Wainwright, Harish Tayyar Madabushi, Aline Villavicencio

The CogNLP-Sheffield submissions to the CMCL 2021 Shared Task examine the value of a variety of cognitively and linguistically inspired features for predicting eye tracking patterns, as both standalone model inputs and as supplements to contextual word embeddings (XLNet).

Word Embeddings

Paper
Add Code

Pre-Trained Language Models Represent Some Geographic Populations Better Than Others

no code implementations • 16 Mar 2024 • Jonathan Dunn, Benjamin Adams, Harish Tayyar Madabushi

This paper measures the skew in how well two families of LLMs represent diverse geographic populations.

Paper
Add Code

Code-Mixed Probes Show How Pre-Trained Models Generalise On Code-Switched Text

1 code implementation • 7 Mar 2024 • Frances A. Laureano De Leon, Harish Tayyar Madabushi, Mark Lee

Code-switching is a prevalent linguistic phenomenon in which multilingual individuals seamlessly alternate between languages.

Paper
Code

Standardize: Aligning Language Models with Expert-Defined Standards for Content Generation

no code implementations • 19 Feb 2024 • Joseph Marvin Imperial, Gail Forey, Harish Tayyar Madabushi

Domain experts across engineering, healthcare, and education follow strict standards for producing quality content such as technical manuals, medication instructions, and children's reading materials.

In-Context Learning Retrieval +1

Paper
Add Code

Word Boundary Information Isn't Useful for Encoder Language Models

no code implementations • 15 Jan 2024 • Edward Gow-Smith, Dylan Phelps, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio

As such, removing these symbols has been shown to have a beneficial effect on the processing of morphologically complex words for transformer encoders in the pretrain-finetune paradigm.

NER Sentence

Paper
Add Code

Flesch or Fumble? Evaluating Readability Standard Alignment of Instruction-Tuned Language Models

1 code implementation • 11 Sep 2023 • Joseph Marvin Imperial, Harish Tayyar Madabushi

Readability metrics and standards such as Flesch Kincaid Grade Level (FKGL) and the Common European Framework of Reference for Languages (CEFR) exist to guide teachers and educators to properly assess the complexity of educational materials before administering them for classroom use.

Paper
Code

Are Emergent Abilities in Large Language Models just In-Context Learning?

1 code implementation • 4 Sep 2023 • Sheng Lu, Irina Bigoulaeva, Rachneet Sachdeva, Harish Tayyar Madabushi, Iryna Gurevych

Large language models have exhibited emergent abilities, demonstrating exceptional performance across diverse tasks for which they were not explicitly trained, including those that require complex reasoning abilities.

In-Context Learning Instruction Following

Paper
Code

Construction Grammar and Language Models

no code implementations • 25 Aug 2023 • Harish Tayyar Madabushi, Laurence Romain, Petar Milin, Dagmar Divjak

In this chapter, we explore three distinct approaches to the interplay between computational methods and Construction Grammar: (i) computational methods for text analysis, (ii) computational Construction Grammar, and (iii) deep learning models, with a particular focus on language models.

Paper
Add Code

Effective Cross-Task Transfer Learning for Explainable Natural Language Inference with T5

1 code implementation • 31 Oct 2022 • Irina Bigoulaeva, Rachneet Sachdeva, Harish Tayyar Madabushi, Aline Villavicencio, Iryna Gurevych

We compare sequential fine-tuning with a model for multi-task learning in the context where we are interested in boosting performance on two tasks, one of which depends on the other.

Multi-Task Learning Natural Language Inference

Paper
Code

Abstraction not Memory: BERT and the English Article System

1 code implementation • NAACL 2022 • Harish Tayyar Madabushi, Dagmar Divjak, Petar Milin

Article prediction is a task that has long defied accurate linguistic description.

Paper
Code

Sample Efficient Approaches for Idiomaticity Detection

no code implementations • LREC (MWE) 2022 • Dylan Phelps, Xuan-Rui Fan, Edward Gow-Smith, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio

In particular we study the impact of Pattern Exploit Training (PET), a few-shot method of classification, and BERTRAM, an efficient method of creating contextual embeddings, on the task of idiomaticity detection.

Paper
Add Code

SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding

1 code implementation • SemEval (NAACL) 2022 • Harish Tayyar Madabushi, Edward Gow-Smith, Marcos Garcia, Carolina Scarton, Marco Idiart, Aline Villavicencio

This paper presents the shared task on Multilingual Idiomaticity Detection and Sentence Embedding, which consists of two subtasks: (a) a binary classification task aimed at identifying whether a sentence contains an idiomatic expression, and (b) a task based on semantic text similarity which requires the model to adequately represent potentially idiomatic expressions in context.

Binary Classification Sentence +4

Paper
Code

Uniform Complexity for Text Generation

1 code implementation • 11 Apr 2022 • Joseph Marvin Imperial, Harish Tayyar Madabushi

Large language models (LLMs) have shown promising results in a wide array of generative NLP tasks, such as summarization and machine translation.

Machine Translation Question Answering +1

Paper
Code

Improving Tokenisation by Alternative Treatment of Spaces

1 code implementation • 8 Apr 2022 • Edward Gow-Smith, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio

We find that our modified algorithms lead to improved performance on downstream NLP tasks that involve handling complex words, whilst having no detrimental effect on performance in general natural language understanding tasks.

Natural Language Understanding

Paper
Code

Learned Construction Grammars Converge Across Registers Given Increased Exposure

no code implementations • CoNLL (EMNLP) 2021 • Jonathan Dunn, Harish Tayyar Madabushi

These simulations are repeated with increasing amounts of exposure, from 100k to 2 million words, to measure the impact of exposure on the convergence of grammars.

Paper
Add Code

UoB at SemEval-2021 Task 5: Extending Pre-Trained Language Models to Include Task and Domain-Specific Information for Toxic Span Prediction

1 code implementation • SEMEVAL 2021 • Erik Yan, Harish Tayyar Madabushi

Toxicity is pervasive in social media and poses a major threat to the health of online communities.

token-classification Toxic Spans Detection

Paper
Code

AStitchInLanguageModels: Dataset and Methods for the Exploration of Idiomaticity in Pre-Trained Language Models

1 code implementation • Findings (EMNLP) 2021 • Harish Tayyar Madabushi, Edward Gow-Smith, Carolina Scarton, Aline Villavicencio

Despite their success in a variety of NLP tasks, pre-trained language models, due to their heavy reliance on compositionality, fail in effectively capturing the meanings of multiword expressions (MWEs), especially idioms.

Language Modelling

Paper
Code

UoB\_UK at SemEval 2021 Task 2: Zero-Shot and Few-Shot Learning for Multi-lingual and Cross-lingual Word Sense Disambiguation.

no code implementations • SEMEVAL 2021 • Wei Li, Harish Tayyar Madabushi, Mark Lee

This paper describes our submission to SemEval 2021 Task 2.

Few-Shot Learning Task 2 +2

Paper
Add Code

CxGBERT: BERT meets Construction Grammar

1 code implementation • COLING 2020 • Harish Tayyar Madabushi, Laurence Romain, Dagmar Divjak, Petar Milin

BERT's training objectives give it access to a tremendous amount of lexico-semantic information, and while BERTology has shown that BERT captures certain important linguistic dimensions, there have been no studies exploring the extent to which BERT might have access to constructional information.

Paper
Code

UoB at SemEval-2020 Task 1: Automatic Identification of Novel Word Senses

1 code implementation • SEMEVAL 2020 • Eleri Sarsfield, Harish Tayyar Madabushi

Much as the social landscape in which languages are spoken shifts, language too evolves to suit the needs of its users.

Change Detection Word Sense Induction

Paper
Code

Incorporating Count-Based Features into Pre-Trained Models for Improved Stance Detection

1 code implementation • NLP4IF (COLING) 2020 • Anushka Prakash, Harish Tayyar Madabushi

The explosive growth and popularity of Social Media has revolutionised the way we communicate and collaborate.

Misinformation Stance Detection

Paper
Code

CXP949 at WNUT-2020 Task 2: Extracting Informative COVID-19 Tweets -- RoBERTa Ensembles and The Continued Relevance of Handcrafted Features

1 code implementation • 15 Oct 2020 • Calum Perrio, Harish Tayyar Madabushi

This paper presents our submission to Task 2 of the Workshop on Noisy User-generated Text.

General Classification Language Modelling +3

Paper
Code

UoB at SemEval-2020 Task 12: Boosting BERT with Corpus Level Information

no code implementations • SEMEVAL 2020 • Wah Meng Lim, Harish Tayyar Madabushi

Pre-trained language model word representation, such as BERT, have been extremely successful in several Natural Language Processing tasks significantly improving on the state-of-the-art.

Abuse Detection Language Modelling +1

Paper
Add Code

Augmenting Neural Metaphor Detection with Concreteness

no code implementations • WS 2020 • Ghadi Alnafesah, Harish Tayyar Madabushi, Mark Lee

The idea that a shift in concreteness within a sentence indicates the presence of a metaphor has been around for a while.

Sentence

Paper
Add Code

CS-Embed at SemEval-2020 Task 9: The effectiveness of code-switched word embeddings for sentiment analysis

1 code implementation • SEMEVAL 2020 • Frances Adriana Laureano De Leon, Florimond Guéniat, Harish Tayyar Madabushi

The growing popularity and applications of sentiment analysis of social media posts has naturally led to sentiment analysis of posts written in multiple languages, a practice known as code-switching.

Multilingual Word Embeddings Sentiment Analysis

Paper
Code

Cost-Sensitive BERT for Generalisable Sentence Classification with Imbalanced Data

1 code implementation • 16 Mar 2020 • Harish Tayyar Madabushi, Elena Kochkina, Michael Castelle

The automatic identification of propaganda has gained significance in recent years due to technological and social changes in the way news is generated and consumed.

Data Augmentation General Classification +5

Paper
Code

Keeping it simple: Implementation and performance of the proto-principle of adaptation and learning in the language sciences

1 code implementation • 8 Mar 2020 • Petar Milin, Harish Tayyar Madabushi, Michael Croucher, Dagmar Divjak

In this paper we present the Widrow-Hoff rule and its applications to language data.

Paper
Code

Cost-Sensitive BERT for Generalisable Sentence Classification on Imbalanced Data

no code implementations • WS 2019 • Harish Tayyar Madabushi, Elena Kochkina, Michael Castelle

The automatic identification of propaganda has gained significance in recent years due to technological and social changes in the way news is generated and consumed.

Data Augmentation General Classification +5

Paper
Add Code

Multi-class Hierarchical Question Classification for Multiple Choice Science Exams

1 code implementation • LREC 2020 • Dongfang Xu, Peter Jansen, Jaycie Martin, Zhengnan Xie, Vikas Yadav, Harish Tayyar Madabushi, Oyvind Tafjord, Peter Clark

Prior work has demonstrated that question classification (QC), recognizing the problem domain of a question, can help answer it more accurately.

Classification General Classification +2

Paper
Code

Integrating Question Classification and Deep Learning for improved Answer Selection

no code implementations • COLING 2018 • Harish Tayyar Madabushi, Mark Lee, John Barnden

We present a system for Answer Selection that integrates fine-grained Question Classification with a Deep Learning model designed for Answer Selection.

Answer Selection Classification +1

Paper
Add Code

High Accuracy Rule-based Question Classification using Question Syntax and Semantics

no code implementations • COLING 2016 • Harish Tayyar Madabushi, Mark Lee

We present in this paper a purely rule-based system for Question Classification which we divide into two parts: The first is the extraction of relevant words from a question by use of its structure, and the second is the classification of questions based on rules that associate these words to Concepts.

Ranked #1 on Text Classification on TREC-50

BIG-bench Machine Learning General Classification +3

Paper
Add Code

UoB-UK at SemEval-2016 Task 1: A Flexible and Extendable System for Semantic Text Similarity using Types, Surprise and Phrase Linking

no code implementations • SEMEVAL 2016 • Harish Tayyar Madabushi, Mark Buhagiar, Mark Lee

Machine Translation text similarity

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.