Search Results for author: Harish Tayyar Madabushi

Found 34 papers, 18 papers with code

CogNLP-Sheffield at CMCL 2021 Shared Task: Blending Cognitively Inspired Features with Transformer-based Language Models for Predicting Eye Tracking Patterns

no code implementations NAACL (CMCL) 2021 Peter Vickers, Rosa Wainwright, Harish Tayyar Madabushi, Aline Villavicencio

The CogNLP-Sheffield submissions to the CMCL 2021 Shared Task examine the value of a variety of cognitively and linguistically inspired features for predicting eye tracking patterns, as both standalone model inputs and as supplements to contextual word embeddings (XLNet).

Word Embeddings

Pre-Trained Language Models Represent Some Geographic Populations Better Than Others

no code implementations16 Mar 2024 Jonathan Dunn, Benjamin Adams, Harish Tayyar Madabushi

This paper measures the skew in how well two families of LLMs represent diverse geographic populations.

Code-Mixed Probes Show How Pre-Trained Models Generalise On Code-Switched Text

1 code implementation7 Mar 2024 Frances A. Laureano De Leon, Harish Tayyar Madabushi, Mark Lee

Code-switching is a prevalent linguistic phenomenon in which multilingual individuals seamlessly alternate between languages.

Standardize: Aligning Language Models with Expert-Defined Standards for Content Generation

no code implementations19 Feb 2024 Joseph Marvin Imperial, Gail Forey, Harish Tayyar Madabushi

Domain experts across engineering, healthcare, and education follow strict standards for producing quality content such as technical manuals, medication instructions, and children's reading materials.

In-Context Learning Retrieval +1

Word Boundary Information Isn't Useful for Encoder Language Models

no code implementations15 Jan 2024 Edward Gow-Smith, Dylan Phelps, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio

As such, removing these symbols has been shown to have a beneficial effect on the processing of morphologically complex words for transformer encoders in the pretrain-finetune paradigm.

NER Sentence

Flesch or Fumble? Evaluating Readability Standard Alignment of Instruction-Tuned Language Models

1 code implementation11 Sep 2023 Joseph Marvin Imperial, Harish Tayyar Madabushi

Readability metrics and standards such as Flesch Kincaid Grade Level (FKGL) and the Common European Framework of Reference for Languages (CEFR) exist to guide teachers and educators to properly assess the complexity of educational materials before administering them for classroom use.

Are Emergent Abilities in Large Language Models just In-Context Learning?

1 code implementation4 Sep 2023 Sheng Lu, Irina Bigoulaeva, Rachneet Sachdeva, Harish Tayyar Madabushi, Iryna Gurevych

Large language models have exhibited emergent abilities, demonstrating exceptional performance across diverse tasks for which they were not explicitly trained, including those that require complex reasoning abilities.

In-Context Learning Instruction Following

Construction Grammar and Language Models

no code implementations25 Aug 2023 Harish Tayyar Madabushi, Laurence Romain, Petar Milin, Dagmar Divjak

In this chapter, we explore three distinct approaches to the interplay between computational methods and Construction Grammar: (i) computational methods for text analysis, (ii) computational Construction Grammar, and (iii) deep learning models, with a particular focus on language models.

Effective Cross-Task Transfer Learning for Explainable Natural Language Inference with T5

1 code implementation31 Oct 2022 Irina Bigoulaeva, Rachneet Sachdeva, Harish Tayyar Madabushi, Aline Villavicencio, Iryna Gurevych

We compare sequential fine-tuning with a model for multi-task learning in the context where we are interested in boosting performance on two tasks, one of which depends on the other.

Multi-Task Learning Natural Language Inference

Sample Efficient Approaches for Idiomaticity Detection

no code implementations LREC (MWE) 2022 Dylan Phelps, Xuan-Rui Fan, Edward Gow-Smith, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio

In particular we study the impact of Pattern Exploit Training (PET), a few-shot method of classification, and BERTRAM, an efficient method of creating contextual embeddings, on the task of idiomaticity detection.

SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding

1 code implementation SemEval (NAACL) 2022 Harish Tayyar Madabushi, Edward Gow-Smith, Marcos Garcia, Carolina Scarton, Marco Idiart, Aline Villavicencio

This paper presents the shared task on Multilingual Idiomaticity Detection and Sentence Embedding, which consists of two subtasks: (a) a binary classification task aimed at identifying whether a sentence contains an idiomatic expression, and (b) a task based on semantic text similarity which requires the model to adequately represent potentially idiomatic expressions in context.

Binary Classification Sentence +4

Uniform Complexity for Text Generation

1 code implementation11 Apr 2022 Joseph Marvin Imperial, Harish Tayyar Madabushi

Large language models (LLMs) have shown promising results in a wide array of generative NLP tasks, such as summarization and machine translation.

Machine Translation Question Answering +1

Improving Tokenisation by Alternative Treatment of Spaces

1 code implementation8 Apr 2022 Edward Gow-Smith, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio

We find that our modified algorithms lead to improved performance on downstream NLP tasks that involve handling complex words, whilst having no detrimental effect on performance in general natural language understanding tasks.

Natural Language Understanding

Learned Construction Grammars Converge Across Registers Given Increased Exposure

no code implementations CoNLL (EMNLP) 2021 Jonathan Dunn, Harish Tayyar Madabushi

These simulations are repeated with increasing amounts of exposure, from 100k to 2 million words, to measure the impact of exposure on the convergence of grammars.

AStitchInLanguageModels: Dataset and Methods for the Exploration of Idiomaticity in Pre-Trained Language Models

1 code implementation Findings (EMNLP) 2021 Harish Tayyar Madabushi, Edward Gow-Smith, Carolina Scarton, Aline Villavicencio

Despite their success in a variety of NLP tasks, pre-trained language models, due to their heavy reliance on compositionality, fail in effectively capturing the meanings of multiword expressions (MWEs), especially idioms.

Language Modelling

CxGBERT: BERT meets Construction Grammar

1 code implementation COLING 2020 Harish Tayyar Madabushi, Laurence Romain, Dagmar Divjak, Petar Milin

BERT's training objectives give it access to a tremendous amount of lexico-semantic information, and while BERTology has shown that BERT captures certain important linguistic dimensions, there have been no studies exploring the extent to which BERT might have access to constructional information.

UoB at SemEval-2020 Task 12: Boosting BERT with Corpus Level Information

no code implementations SEMEVAL 2020 Wah Meng Lim, Harish Tayyar Madabushi

Pre-trained language model word representation, such as BERT, have been extremely successful in several Natural Language Processing tasks significantly improving on the state-of-the-art.

Abuse Detection Language Modelling +1

Augmenting Neural Metaphor Detection with Concreteness

no code implementations WS 2020 Ghadi Alnafesah, Harish Tayyar Madabushi, Mark Lee

The idea that a shift in concreteness within a sentence indicates the presence of a metaphor has been around for a while.

Sentence

CS-Embed at SemEval-2020 Task 9: The effectiveness of code-switched word embeddings for sentiment analysis

1 code implementation SEMEVAL 2020 Frances Adriana Laureano De Leon, Florimond Guéniat, Harish Tayyar Madabushi

The growing popularity and applications of sentiment analysis of social media posts has naturally led to sentiment analysis of posts written in multiple languages, a practice known as code-switching.

Multilingual Word Embeddings Sentiment Analysis

Cost-Sensitive BERT for Generalisable Sentence Classification with Imbalanced Data

1 code implementation16 Mar 2020 Harish Tayyar Madabushi, Elena Kochkina, Michael Castelle

The automatic identification of propaganda has gained significance in recent years due to technological and social changes in the way news is generated and consumed.

Data Augmentation General Classification +5

Cost-Sensitive BERT for Generalisable Sentence Classification on Imbalanced Data

no code implementations WS 2019 Harish Tayyar Madabushi, Elena Kochkina, Michael Castelle

The automatic identification of propaganda has gained significance in recent years due to technological and social changes in the way news is generated and consumed.

Data Augmentation General Classification +5

Integrating Question Classification and Deep Learning for improved Answer Selection

no code implementations COLING 2018 Harish Tayyar Madabushi, Mark Lee, John Barnden

We present a system for Answer Selection that integrates fine-grained Question Classification with a Deep Learning model designed for Answer Selection.

Answer Selection Classification +1

High Accuracy Rule-based Question Classification using Question Syntax and Semantics

no code implementations COLING 2016 Harish Tayyar Madabushi, Mark Lee

We present in this paper a purely rule-based system for Question Classification which we divide into two parts: The first is the extraction of relevant words from a question by use of its structure, and the second is the classification of questions based on rules that associate these words to Concepts.

BIG-bench Machine Learning General Classification +3

Cannot find the paper you are looking for? You can Submit a new open access paper.