no code implementations • NAACL (SMM4H) 2021 • Frances Adriana Laureano De Leon, Harish Tayyar Madabushi, Mark Lee
This paper describes the participation of the UoB-NLP team in the ProfNER-ST shared subtask 7a.
no code implementations • EMNLP (WNUT) 2020 • Calum Perrio, Harish Tayyar Madabushi
This paper presents our submission to Task 2 of the Workshop on Noisy User-generated Text.
no code implementations • NAACL (CMCL) 2021 • Peter Vickers, Rosa Wainwright, Harish Tayyar Madabushi, Aline Villavicencio
The CogNLP-Sheffield submissions to the CMCL 2021 Shared Task examine the value of a variety of cognitively and linguistically inspired features for predicting eye tracking patterns, as both standalone model inputs and as supplements to contextual word embeddings (XLNet).
no code implementations • 16 Mar 2024 • Jonathan Dunn, Benjamin Adams, Harish Tayyar Madabushi
This paper measures the skew in how well two families of LLMs represent diverse geographic populations.
1 code implementation • 7 Mar 2024 • Frances A. Laureano De Leon, Harish Tayyar Madabushi, Mark Lee
Code-switching is a prevalent linguistic phenomenon in which multilingual individuals seamlessly alternate between languages.
no code implementations • 19 Feb 2024 • Joseph Marvin Imperial, Gail Forey, Harish Tayyar Madabushi
Domain experts across engineering, healthcare, and education follow strict standards for producing quality content such as technical manuals, medication instructions, and children's reading materials.
no code implementations • 15 Jan 2024 • Edward Gow-Smith, Dylan Phelps, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio
As such, removing these symbols has been shown to have a beneficial effect on the processing of morphologically complex words for transformer encoders in the pretrain-finetune paradigm.
1 code implementation • 11 Sep 2023 • Joseph Marvin Imperial, Harish Tayyar Madabushi
Readability metrics and standards such as Flesch Kincaid Grade Level (FKGL) and the Common European Framework of Reference for Languages (CEFR) exist to guide teachers and educators to properly assess the complexity of educational materials before administering them for classroom use.
1 code implementation • 4 Sep 2023 • Sheng Lu, Irina Bigoulaeva, Rachneet Sachdeva, Harish Tayyar Madabushi, Iryna Gurevych
Large language models have exhibited emergent abilities, demonstrating exceptional performance across diverse tasks for which they were not explicitly trained, including those that require complex reasoning abilities.
no code implementations • 25 Aug 2023 • Harish Tayyar Madabushi, Laurence Romain, Petar Milin, Dagmar Divjak
In this chapter, we explore three distinct approaches to the interplay between computational methods and Construction Grammar: (i) computational methods for text analysis, (ii) computational Construction Grammar, and (iii) deep learning models, with a particular focus on language models.
1 code implementation • 31 Oct 2022 • Irina Bigoulaeva, Rachneet Sachdeva, Harish Tayyar Madabushi, Aline Villavicencio, Iryna Gurevych
We compare sequential fine-tuning with a model for multi-task learning in the context where we are interested in boosting performance on two tasks, one of which depends on the other.
1 code implementation • NAACL 2022 • Harish Tayyar Madabushi, Dagmar Divjak, Petar Milin
Article prediction is a task that has long defied accurate linguistic description.
no code implementations • LREC (MWE) 2022 • Dylan Phelps, Xuan-Rui Fan, Edward Gow-Smith, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio
In particular we study the impact of Pattern Exploit Training (PET), a few-shot method of classification, and BERTRAM, an efficient method of creating contextual embeddings, on the task of idiomaticity detection.
1 code implementation • SemEval (NAACL) 2022 • Harish Tayyar Madabushi, Edward Gow-Smith, Marcos Garcia, Carolina Scarton, Marco Idiart, Aline Villavicencio
This paper presents the shared task on Multilingual Idiomaticity Detection and Sentence Embedding, which consists of two subtasks: (a) a binary classification task aimed at identifying whether a sentence contains an idiomatic expression, and (b) a task based on semantic text similarity which requires the model to adequately represent potentially idiomatic expressions in context.
1 code implementation • 11 Apr 2022 • Joseph Marvin Imperial, Harish Tayyar Madabushi
Large language models (LLMs) have shown promising results in a wide array of generative NLP tasks, such as summarization and machine translation.
1 code implementation • 8 Apr 2022 • Edward Gow-Smith, Harish Tayyar Madabushi, Carolina Scarton, Aline Villavicencio
We find that our modified algorithms lead to improved performance on downstream NLP tasks that involve handling complex words, whilst having no detrimental effect on performance in general natural language understanding tasks.
no code implementations • CoNLL (EMNLP) 2021 • Jonathan Dunn, Harish Tayyar Madabushi
These simulations are repeated with increasing amounts of exposure, from 100k to 2 million words, to measure the impact of exposure on the convergence of grammars.
1 code implementation • SEMEVAL 2021 • Erik Yan, Harish Tayyar Madabushi
Toxicity is pervasive in social media and poses a major threat to the health of online communities.
1 code implementation • Findings (EMNLP) 2021 • Harish Tayyar Madabushi, Edward Gow-Smith, Carolina Scarton, Aline Villavicencio
Despite their success in a variety of NLP tasks, pre-trained language models, due to their heavy reliance on compositionality, fail in effectively capturing the meanings of multiword expressions (MWEs), especially idioms.
no code implementations • SEMEVAL 2021 • Wei Li, Harish Tayyar Madabushi, Mark Lee
This paper describes our submission to SemEval 2021 Task 2.
1 code implementation • COLING 2020 • Harish Tayyar Madabushi, Laurence Romain, Dagmar Divjak, Petar Milin
BERT's training objectives give it access to a tremendous amount of lexico-semantic information, and while BERTology has shown that BERT captures certain important linguistic dimensions, there have been no studies exploring the extent to which BERT might have access to constructional information.
1 code implementation • SEMEVAL 2020 • Eleri Sarsfield, Harish Tayyar Madabushi
Much as the social landscape in which languages are spoken shifts, language too evolves to suit the needs of its users.
1 code implementation • NLP4IF (COLING) 2020 • Anushka Prakash, Harish Tayyar Madabushi
The explosive growth and popularity of Social Media has revolutionised the way we communicate and collaborate.
1 code implementation • 15 Oct 2020 • Calum Perrio, Harish Tayyar Madabushi
This paper presents our submission to Task 2 of the Workshop on Noisy User-generated Text.
no code implementations • SEMEVAL 2020 • Wah Meng Lim, Harish Tayyar Madabushi
Pre-trained language model word representation, such as BERT, have been extremely successful in several Natural Language Processing tasks significantly improving on the state-of-the-art.
no code implementations • WS 2020 • Ghadi Alnafesah, Harish Tayyar Madabushi, Mark Lee
The idea that a shift in concreteness within a sentence indicates the presence of a metaphor has been around for a while.
1 code implementation • SEMEVAL 2020 • Frances Adriana Laureano De Leon, Florimond Guéniat, Harish Tayyar Madabushi
The growing popularity and applications of sentiment analysis of social media posts has naturally led to sentiment analysis of posts written in multiple languages, a practice known as code-switching.
1 code implementation • 16 Mar 2020 • Harish Tayyar Madabushi, Elena Kochkina, Michael Castelle
The automatic identification of propaganda has gained significance in recent years due to technological and social changes in the way news is generated and consumed.
1 code implementation • 8 Mar 2020 • Petar Milin, Harish Tayyar Madabushi, Michael Croucher, Dagmar Divjak
In this paper we present the Widrow-Hoff rule and its applications to language data.
no code implementations • WS 2019 • Harish Tayyar Madabushi, Elena Kochkina, Michael Castelle
The automatic identification of propaganda has gained significance in recent years due to technological and social changes in the way news is generated and consumed.
1 code implementation • LREC 2020 • Dongfang Xu, Peter Jansen, Jaycie Martin, Zhengnan Xie, Vikas Yadav, Harish Tayyar Madabushi, Oyvind Tafjord, Peter Clark
Prior work has demonstrated that question classification (QC), recognizing the problem domain of a question, can help answer it more accurately.
no code implementations • COLING 2018 • Harish Tayyar Madabushi, Mark Lee, John Barnden
We present a system for Answer Selection that integrates fine-grained Question Classification with a Deep Learning model designed for Answer Selection.
no code implementations • COLING 2016 • Harish Tayyar Madabushi, Mark Lee
We present in this paper a purely rule-based system for Question Classification which we divide into two parts: The first is the extraction of relevant words from a question by use of its structure, and the second is the classification of questions based on rules that associate these words to Concepts.
Ranked #1 on Text Classification on TREC-50