Search Results for author: Bertie Vidgen

Found 30 papers, 15 papers with code

Recalibrating classifiers for interpretable abusive content detection

no code implementations • EMNLP (NLP+CSS) 2020 • Bertie Vidgen, Scott Hale, Sam Staton, Tom Melham, Helen Margetts, Ohad Kammar, Marcin Szymczak

We investigate the use of machine learning classifiers for detecting online abuse in empirical research.

Probabilistic Programming

Paper
Add Code

Findings of the WOAH 5 Shared Task on Fine Grained Hateful Memes Detection

no code implementations • ACL (WOAH) 2021 • Lambert Mathias, Shaoliang Nie, Aida Mostafazadeh Davani, Douwe Kiela, Vinodkumar Prabhakaran, Bertie Vidgen, Zeerak Waseem

We present the results and main findings of the shared task at WOAH 5 on hateful memes detection.

Paper
Add Code

Online Abuse and Human Rights: WOAH Satellite Session at RightsCon 2020

no code implementations • EMNLP (ALW) 2020 • Vinodkumar Prabhakaran, Zeerak Waseem, Seyi Akiwowo, Bertie Vidgen

In 2020 The Workshop on Online Abuse and Harms (WOAH) held a satellite panel at RightsCons 2020, an international human rights conference.

Paper
Add Code

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

1 code implementation • 24 Apr 2024 • Hannah Rose Kirk, Alexander Whitefield, Paul Röttger, Andrew Bean, Katerina Margatina, Juan Ciro, Rafael Mosquera, Max Bartolo, Adina Williams, He He, Bertie Vidgen, Scott A. Hale

Human feedback plays a central role in the alignment of Large Language Models (LLMs).

Navigate

Paper
Code

Introducing v0.5 of the AI Safety Benchmark from MLCommons

1 code implementation • 18 Apr 2024 • Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, Rajat Ghosh, James Goel, Usman Gohar, Sujata Goswami, Scott A. Hale, Wiebke Hutiri, Joseph Marvin Imperial, Surgan Jandial, Nick Judd, Felix Juefei-Xu, Foutse khomh, Bhavya Kailkhura, Hannah Rose Kirk, Kevin Klyman, Chris Knotz, Michael Kuchnik, Shachi H. Kumar, Chris Lengerich, Bo Li, Zeyi Liao, Eileen Peters Long, Victor Lu, Yifan Mai, Priyanka Mary Mammen, Kelvin Manyeki, Sean McGregor, Virendra Mehta, Shafee Mohammed, Emanuel Moss, Lama Nachman, Dinesh Jinenhally Naganna, Amin Nikanjam, Besmira Nushi, Luis Oala, Iftach Orr, Alicia Parrish, Cigdem Patlak, William Pietri, Forough Poursabzi-Sangdeh, Eleonora Presani, Fabrizio Puletti, Paul Röttger, Saurav Sahay, Tim Santos, Nino Scherrer, Alice Schoenauer Sebag, Patrick Schramowski, Abolfazl Shahbazi, Vin Sharma, Xudong Shen, Vamsi Sistla, Leonard Tang, Davide Testuggine, Vithursan Thangarasa, Elizabeth Anne Watkins, Rebecca Weiss, Chris Welty, Tyler Wilbers, Adina Williams, Carole-Jean Wu, Poonam Yadav, Xianjun Yang, Yi Zeng, Wenhui Zhang, Fedor Zhdanov, Jiacheng Zhu, Percy Liang, Peter Mattson, Joaquin Vanschoren

We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0. 5 benchmark.

Paper
Code

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

2 code implementations • 8 Apr 2024 • Paul Röttger, Fabio Pernisi, Bertie Vidgen, Dirk Hovy

Researchers and practitioners have met these concerns by introducing an abundance of new datasets for evaluating and improving LLM safety.

Language Modelling Large Language Model

159

Paper
Code

TrustLLM: Trustworthiness in Large Language Models

1 code implementation • 10 Jan 2024 • Lichao Sun, Yue Huang, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao liu, Heng Ji, Hongyi Wang, huan zhang, Huaxiu Yao, Manolis Kellis, Marinka Zitnik, Meng Jiang, Mohit Bansal, James Zou, Jian Pei, Jian Liu, Jianfeng Gao, Jiawei Han, Jieyu Zhao, Jiliang Tang, Jindong Wang, Joaquin Vanschoren, John Mitchell, Kai Shu, Kaidi Xu, Kai-Wei Chang, Lifang He, Lifu Huang, Michael Backes, Neil Zhenqiang Gong, Philip S. Yu, Pin-Yu Chen, Quanquan Gu, ran Xu, Rex Ying, Shuiwang Ji, Suman Jana, Tianlong Chen, Tianming Liu, Tianyi Zhou, William Wang, Xiang Li, Xiangliang Zhang, Xiao Wang, Xing Xie, Xun Chen, Xuyu Wang, Yan Liu, Yanfang Ye, Yinzhi Cao, Yong Chen, Yue Zhao

This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions.

Ethics Fairness

273

Paper
Code

FinanceBench: A New Benchmark for Financial Question Answering

1 code implementation • 20 Nov 2023 • Pranab Islam, Anand Kannappan, Douwe Kiela, Rebecca Qian, Nino Scherrer, Bertie Vidgen

We test 16 state of the art model configurations (including GPT-4-Turbo, Llama2 and Claude2, with vector stores and long context prompts) on a sample of 150 cases from FinanceBench, and manually review their answers (n=2, 400).

Question Answering Retrieval +1

Paper
Code

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

no code implementations • 14 Nov 2023 • Bertie Vidgen, Nino Scherrer, Hannah Rose Kirk, Rebecca Qian, Anand Kannappan, Scott A. Hale, Paul Röttger

While some of the models do not give a single unsafe response, most give unsafe responses to more than 20% of the prompts, with over 50% unsafe responses in the extreme.

Paper
Add Code

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values

no code implementations • 11 Oct 2023 • Hannah Rose Kirk, Andrew M. Bean, Bertie Vidgen, Paul Röttger, Scott A. Hale

Human feedback is increasingly used to steer the behaviours of Large Language Models (LLMs).

Paper
Add Code

The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models

no code implementations • 3 Oct 2023 • Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale

In this paper, we address the concept of "alignment" in large language models (LLMs) through the lens of post-structuralist socio-political theory, specifically examining its parallels to empty signifiers.

Paper
Add Code

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

1 code implementation • 2 Aug 2023 • Paul Röttger, Hannah Rose Kirk, Bertie Vidgen, Giuseppe Attanasio, Federico Bianchi, Dirk Hovy

In this paper, we introduce a new test suite called XSTest to identify such eXaggerated Safety behaviours in a systematic way.

Language Modelling

Paper
Code

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback

no code implementations • 9 Mar 2023 • Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale

Large language models (LLMs) are used to generate content for a wide range of tasks, and are set to reach a growing audience in coming years due to integration in product interfaces like ChatGPT or search engines like Bing.

Paper
Add Code

SemEval-2023 Task 10: Explainable Detection of Online Sexism

1 code implementation • 7 Mar 2023 • Hannah Rose Kirk, Wenjie Yin, Bertie Vidgen, Paul Röttger

Online sexism is a widespread and harmful phenomenon.

Paper
Code

Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning

1 code implementation • TRAC (COLING) 2022 • Hannah Rose Kirk, Bertie Vidgen, Scott A. Hale

Annotating abusive language is expensive, logistically complex and creates a risk of psychological harm.

Abusive Language Active Learning

Paper
Code

Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models

1 code implementation • NAACL (WOAH) 2022 • Paul Röttger, Haitham Seelawi, Debora Nozza, Zeerak Talat, Bertie Vidgen

To help address this issue, we introduce Multilingual HateCheck (MHC), a suite of functional tests for multilingual hate speech detection models.

Hate Speech Detection

Paper
Code

Handling and Presenting Harmful Text in NLP Research

no code implementations • 29 Apr 2022 • Hannah Rose Kirk, Abeba Birhane, Bertie Vidgen, Leon Derczynski

Text data can pose a risk of harm.

Misinformation Part-Of-Speech Tagging +1

Paper
Add Code

Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks

1 code implementation • NAACL 2022 • Paul Röttger, Bertie Vidgen, Dirk Hovy, Janet B. Pierrehumbert

To address this issue, we propose two contrasting paradigms for data annotation.

Descriptive valid +1

Paper
Code

An influencer-based approach to understanding radical right viral tweets

no code implementations • 15 Sep 2021 • Laila Sprejer, Helen Margetts, Kleber Oliveira, David O'Sullivan, Bertie Vidgen

We show that it is crucial to account for the influencer-level structure, and find evidence of the importance of both influencer- and content-level factors, including the number of followers each influencer has, the type of content (original posts, quotes and replies), the length and toxicity of content, and whether influencers request retweets.

Paper
Add Code

Deciphering Implicit Hate: Evaluating Automated Detection Algorithms for Multimodal Hate

no code implementations • Findings (ACL) 2021 • Austin Botelho, Bertie Vidgen, Scott A. Hale

We show that both text- and visual- enrichment improves model performance, with the multimodal model (0. 771) outperforming other models' F1 scores (0. 544, 0. 737, and 0. 754).

Paper
Add Code

Introducing CAD: the Contextual Abuse Dataset

1 code implementation • NAACL 2021 • Bertie Vidgen, Dong Nguyen, Helen Margetts, Patricia Rossini, Rebekah Tromble

Online abuse can inflict harm on users and communities, making online spaces unsafe and toxic.

Paper
Code

Dynabench: Rethinking Benchmarking in NLP

no code implementations • NAACL 2021 • Douwe Kiela, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen, Grusha Prasad, Amanpreet Singh, Pratik Ringshia, Zhiyi Ma, Tristan Thrush, Sebastian Riedel, Zeerak Waseem, Pontus Stenetorp, Robin Jia, Mohit Bansal, Christopher Potts, Adina Williams

We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking.

Benchmarking

Paper
Add Code

An Expert Annotated Dataset for the Detection of Online Misogyny

1 code implementation • EACL 2021 • Ella Guest, Bertie Vidgen, Alexandros Mittos, Nishanth Sastry, Gareth Tyson, Helen Margetts

Online misogyny is a pernicious social problem that risks making online platforms toxic and unwelcoming to women.

Binary Classification Classification

Paper
Code

Tackling Racial Bias in Automated Online Hate Detection: Towards Fair and Accurate Classification of Hateful Online Users Using Geometric Deep Learning

no code implementations • 22 Mar 2021 • Zo Ahmed, Bertie Vidgen, Scott A. Hale

Yet, most research in online hate detection to date has focused on hateful content.

Fairness Feature Engineering

Paper
Add Code

Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection

2 code implementations • ACL 2021 • Bertie Vidgen, Tristan Thrush, Zeerak Waseem, Douwe Kiela

We provide a new dataset of ~40, 000 entries, generated and labelled by trained annotators over four rounds of dynamic data creation.

Hate Speech Detection

Paper
Code

Detecting East Asian Prejudice on Social Media

4 code implementations • EMNLP (ALW) 2020 • Bertie Vidgen, Austin Botelho, David Broniatowski, Ella Guest, Matthew Hall, Helen Margetts, Rebekah Tromble, Zeerak Waseem, Scott Hale

The outbreak of COVID-19 has transformed societies across the world as governments tackle the health, economic and social costs of the pandemic.

116

Paper
Code

Directions in Abusive Language Training Data: Garbage In, Garbage Out

no code implementations • 3 Apr 2020 • Bertie Vidgen, Leon Derczynski

Data-driven analysis and detection of abusive online content covers many different tasks, phenomena, contexts, and methodologies.

Abusive Language

Paper
Add Code

Islamophobes are not all the same! A study of far right actors on Twitter

no code implementations • 13 Oct 2019 • Bertie Vidgen, Taha Yasseri, Helen Margetts

Far-right actors are often purveyors of Islamophobic hate speech online, using social media to spread divisive and prejudiced messages which can stir up intergroup tensions and conflict.

Social and Information Networks Computers and Society Physics and Society Applications

Paper
Add Code

Challenges and frontiers in abusive content detection

1 code implementation • WS 2019 • Bertie Vidgen, Alex Harris, Dong Nguyen, Rebekah Tromble, Scott Hale, Helen Margetts

Online abusive content detection is an inherently difficult task.

Abuse Detection

Paper
Code

Detecting weak and strong Islamophobic hate speech on social media

no code implementations • 12 Dec 2018 • Bertie Vidgen, Taha Yasseri

Islamophobic hate speech on social media inflicts considerable harm on both targeted individuals and wider society, and also risks reputational damage for the host platforms.

Word Embeddings

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.