no code implementations • 17 May 2024 • Michael Shliselberg, Ashkan Kazemi, Scott A. Hale, Shiri Dori-Hacohen
To the best of our knowledge, SynDy is the first paper utilizing LLMs to create fine-grained synthetic labels for tasks of direct relevance to misinformation mitigation, namely Claim Matching, Topical Clustering, and Claim Relationship Classification.
1 code implementation • 1 May 2024 • Xi Chen, Scott A. Hale, David Jurgens, Mattia Samory, Ethan Zuckerman, Przemyslaw A. Grabowicz
News coverage profoundly affects how countries and individuals behave in international relations.
no code implementations • 27 Apr 2024 • Manuel Tonneau, Diyi Liu, Samuel Fraiberger, Ralph Schroeder, Scott A. Hale, Paul Röttger
We find that HS datasets for these languages exhibit a strong geo-cultural bias, largely overrepresenting a handful of countries (e. g., US and UK for English) relative to their prominence in both the broader social media population and the general population speaking these languages.
1 code implementation • 24 Apr 2024 • Hannah Rose Kirk, Alexander Whitefield, Paul Röttger, Andrew Bean, Katerina Margatina, Juan Ciro, Rafael Mosquera, Max Bartolo, Adina Williams, He He, Bertie Vidgen, Scott A. Hale
Human feedback plays a central role in the alignment of Large Language Models (LLMs).
1 code implementation • 18 Apr 2024 • Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Max Bartolo, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota, Agasthya Gangavarapu, Ananya Gangavarapu, James Gealy, Rajat Ghosh, James Goel, Usman Gohar, Sujata Goswami, Scott A. Hale, Wiebke Hutiri, Joseph Marvin Imperial, Surgan Jandial, Nick Judd, Felix Juefei-Xu, Foutse khomh, Bhavya Kailkhura, Hannah Rose Kirk, Kevin Klyman, Chris Knotz, Michael Kuchnik, Shachi H. Kumar, Srijan Kumar, Chris Lengerich, Bo Li, Zeyi Liao, Eileen Peters Long, Victor Lu, Sarah Luger, Yifan Mai, Priyanka Mary Mammen, Kelvin Manyeki, Sean McGregor, Virendra Mehta, Shafee Mohammed, Emanuel Moss, Lama Nachman, Dinesh Jinenhally Naganna, Amin Nikanjam, Besmira Nushi, Luis Oala, Iftach Orr, Alicia Parrish, Cigdem Patlak, William Pietri, Forough Poursabzi-Sangdeh, Eleonora Presani, Fabrizio Puletti, Paul Röttger, Saurav Sahay, Tim Santos, Nino Scherrer, Alice Schoenauer Sebag, Patrick Schramowski, Abolfazl Shahbazi, Vin Sharma, Xudong Shen, Vamsi Sistla, Leonard Tang, Davide Testuggine, Vithursan Thangarasa, Elizabeth Anne Watkins, Rebecca Weiss, Chris Welty, Tyler Wilbers, Adina Williams, Carole-Jean Wu, Poonam Yadav, Xianjun Yang, Yi Zeng, Wenhui Zhang, Fedor Zhdanov, Jiacheng Zhu, Percy Liang, Peter Mattson, Joaquin Vanschoren
We created a new taxonomy of 13 hazard categories, of which 7 have tests in the v0. 5 benchmark.
no code implementations • 14 Nov 2023 • Bertie Vidgen, Nino Scherrer, Hannah Rose Kirk, Rebecca Qian, Anand Kannappan, Scott A. Hale, Paul Röttger
While some of the models do not give a single unsafe response, most give unsafe responses to more than 20% of the prompts, with over 50% unsafe responses in the extreme.
no code implementations • 27 Oct 2023 • Dorian Quelle, Calvin Cheng, Alexandre Bovet, Scott A. Hale
Using fact-checks as a proxy for the spread of misinformation, we find 33% of repeated claims cross linguistic boundaries, suggesting that some misinformation permeates language barriers.
no code implementations • 11 Oct 2023 • Hannah Rose Kirk, Andrew M. Bean, Bertie Vidgen, Paul Röttger, Scott A. Hale
Human feedback is increasingly used to steer the behaviours of Large Language Models (LLMs).
no code implementations • 3 Oct 2023 • Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale
In this paper, we address the concept of "alignment" in large language models (LLMs) through the lens of post-structuralist socio-political theory, specifically examining its parallels to empty signifiers.
no code implementations • 15 Sep 2023 • Khyati Khandelwal, Manuel Tonneau, Andrew M. Bean, Hannah Rose Kirk, Scott A. Hale
In this paper, we quantify stereotypical bias in popular LLMs according to an Indian-centric frame and compare bias levels between the Indian and Western contexts.
1 code implementation • 31 Jul 2023 • Angus R. Williams, Hannah Rose Kirk, Liam Burke, Yi-Ling Chung, Ivan Debono, Pica Johansson, Francesca Stevens, Jonathan Bright, Scott A. Hale
We find that (i) small amounts of diverse data are hugely beneficial to generalisation and model adaptation; (ii) models transfer more easily across demographics but models trained on cross-domain data are more generalisable; (iii) some groups contribute more to generalisability than others; and (iv) dataset similarity is a signal of transferability.
no code implementations • 9 Mar 2023 • Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale
Large language models (LLMs) are used to generate content for a wide range of tasks, and are set to reach a growing audience in coming years due to integration in product interfaces like ChatGPT or search engines like Bing.
no code implementations • 14 Oct 2022 • Ashkan Kazemi, Artem Abzaliev, Naihao Deng, Rui Hou, Scott A. Hale, Verónica Pérez-Rosas, Rada Mihalcea
We propose a novel system to help fact-checkers formulate search queries for known misinformation claims and effectively search across multiple social media platforms.
1 code implementation • TRAC (COLING) 2022 • Hannah Rose Kirk, Bertie Vidgen, Scott A. Hale
Annotating abusive language is expensive, logistically complex and creates a risk of psychological harm.
no code implementations • 11 Aug 2022 • Ahmet Kurnaz, Scott A. Hale
Polarization and echo chambers are often studied in the context of explicitly political events such as elections, and little scholarship has examined the mixing of political groups in non-political contexts.
no code implementations • 14 Feb 2022 • Ashkan Kazemi, Zehua Li, Verónica Pérez-Rosas, Scott A. Hale, Rada Mihalcea
We conduct both classification and retrieval experiments, in monolingual (English only), multilingual (Spanish, Portuguese), and cross-lingual (Hindi-English) settings using multilingual transformer models such as XLM-RoBERTa and multilingual embeddings such as LaBSE and SBERT.
no code implementations • 6 Sep 2021 • Shiri Dori-Hacohen, Roberto Montenegro, Fabricio Murai, Scott A. Hale, Keen Sung, Michela Blain, Jennifer Edwards-Johnson
While part (3) of this work specifically focuses on the health domain, the fundamental computer science advances and contributions stemming from research efforts in bias reduction and Fairness via AI have broad implications in all areas of society.
1 code implementation • NAACL 2022 • Hannah Rose Kirk, Bertram Vidgen, Paul Röttger, Tristan Thrush, Scott A. Hale
Using the test suite, we expose weaknesses in existing hate detection models.
no code implementations • Findings (ACL) 2021 • Austin Botelho, Bertie Vidgen, Scott A. Hale
We show that both text- and visual- enrichment improves model performance, with the multimodal model (0. 771) outperforming other models' F1 scores (0. 544, 0. 737, and 0. 754).
no code implementations • 8 Jun 2021 • Ashkan Kazemi, Kiran Garimella, Gautam Kishore Shahi, Devin Gaffney, Scott A. Hale
There is currently no easy way to fact-check content on WhatsApp and other end-to-end encrypted platforms at scale.
no code implementations • ACL 2021 • Ashkan Kazemi, Kiran Garimella, Devin Gaffney, Scott A. Hale
We train our own embedding model using knowledge distillation and a high-quality "teacher" model in order to address the imbalance in embedding quality between the low- and high-resource languages in our dataset.
1 code implementation • 3 May 2021 • Alexander Robertson, Farhana Ferdousi Liza, Dong Nguyen, Barbara McGillivray, Scott A. Hale
The semantics of emoji has, to date, been considered from a static perspective.
no code implementations • 22 Mar 2021 • Zo Ahmed, Bertie Vidgen, Scott A. Hale
Yet, most research in online hate detection to date has focused on hateful content.
no code implementations • 15 May 2019 • Zijian Wang, Scott A. Hale, David Adelani, Przemyslaw A. Grabowicz, Timo Hartmann, Fabian Flöck, David Jurgens
In a large experiment over multilingual heterogeneous European regions, we show that our demographic inference and bias correction together allow for more accurate estimates of populations and make a significant step towards representative social sensing in downstream applications with multilingual social media.
1 code implementation • 27 Aug 2018 • Chico Q. Camargo, Scott A. Hale, Peter John, Helen Z. Margetts
Recent election surprises, regime changes, and political shocks indicate that political agendas have become more fast-moving and volatile.
no code implementations • 1 Feb 2017 • Scott A. Hale, Irene Eleta
The number and quality of user reviews greatly affects consumer purchasing decisions.
no code implementations • 6 May 2016 • Scott A. Hale
The number of user reviews of tourist attractions, restaurants, mobile apps, etc.
no code implementations • 28 Aug 2015 • Suin Kim, Sungjoon Park, Scott A. Hale, Sooyoung Kim, Jeongmin Byun, Alice Oh
We study multilingualism by collecting and analyzing a large dataset of the content written by multilingual editors of the English, German, and Spanish editions of Wikipedia.
no code implementations • 1 Jun 2015 • Han-Teng Liao, King-wa Fu, Scott A. Hale
This paper presents a multilingual study on, per single post of microblog text, (a) how much can be said, (b) how much is written in terms of characters and bytes, and (c) how much is said in terms of information content in posts by different organizations in different languages.
no code implementations • 4 Jan 2015 • Scott A. Hale
This article analyzes users who edit Wikipedia articles about Okinawa, Japan, in English and Japanese.
no code implementations • 3 Dec 2013 • Scott A. Hale
This article analyzes one month of edits to Wikipedia in order to examine the role of users editing multiple language editions (referred to as multilingual users).