1 code implementation • ACL 2022 • Valentin Hofmann, Hinrich Schuetze, Janet Pierrehumbert
We introduce FLOTA (Few Longest Token Approximation), a simple yet effective method to improve the tokenization of pretrained language models (PLMs).
1 code implementation • 1 Mar 2024 • Valentin Hofmann, Pratyusha Ria Kalluri, Dan Jurafsky, Sharese King
Here, we demonstrate that language models embody covert racism in the form of dialect prejudice: we extend research showing that Americans hold raciolinguistic stereotypes about speakers of African American English and find that language models have the same prejudice, exhibiting covert stereotypes that are more negative than any human stereotypes about African Americans ever experimentally recorded, although closest to the ones from before the civil rights movement.
1 code implementation • 26 Feb 2024 • Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schütze, Dirk Hovy
Motivated by this discrepancy, we challenge the prevailing constrained evaluation paradigm for values and opinions in LLMs and explore more realistic unconstrained evaluations.
no code implementations • 5 Feb 2024 • Fangru Lin, Emanuele La Malfa, Valentin Hofmann, Elle Michelle Yang, Anthony Cohn, Janet B. Pierrehumbert
Reasoning about asynchronous plans is challenging since it requires sequential and parallel planning to optimize time costs.
1 code implementation • 31 Jan 2024 • Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen, Emma Strubell, Nishant Subramani, Oyvind Tafjord, Pete Walsh, Luke Zettlemoyer, Noah A. Smith, Hannaneh Hajishirzi, Iz Beltagy, Dirk Groeneveld, Jesse Dodge, Kyle Lo
Language models have become a critical technology to tackling a wide range of natural language processing tasks, yet many details about how the best-performing language models were developed are not reported.
no code implementations • 16 Dec 2023 • Ian Magnusson, Akshita Bhagia, Valentin Hofmann, Luca Soldaini, Ananya Harsh Jha, Oyvind Tafjord, Dustin Schwenk, Evan Pete Walsh, Yanai Elazar, Kyle Lo, Dirk Groeneveld, Iz Beltagy, Hannaneh Hajishirzi, Noah A. Smith, Kyle Richardson, Jesse Dodge
We invite submissions to our benchmark and organize results by comparability based on compliance with guidelines such as removal of benchmark contamination from pretraining.
no code implementations • 23 Oct 2023 • Leonie Weissweiler, Valentin Hofmann, Anjali Kantharuban, Anna Cai, Ritam Dutt, Amey Hengle, Anubha Kabra, Atharva Kulkarni, Abhishek Vijayakumar, Haofei Yu, Hinrich Schütze, Kemal Oflazer, David R. Mortensen
Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills.
1 code implementation • 14 Dec 2022 • Valentin Hofmann, Janet B. Pierrehumbert, Hinrich Schütze
We propose a fully unsupervised method to detect bias in contextualized embeddings.
no code implementations • 24 Oct 2022 • Leonie Weissweiler, Valentin Hofmann, Abdullatif Köksal, Hinrich Schütze
Construction Grammar (CxG) is a paradigm from cognitive linguistics emphasising the connection between syntax and semantics.
1 code implementation • ACL 2022 • Leonie Weissweiler, Valentin Hofmann, Masoud Jalili Sabet, Hinrich Schütze
We introduce CaMEL (Case Marker Extraction without Labels), a novel and challenging task in computational morphology that is especially relevant for low-resource languages.
no code implementations • 16 Mar 2022 • Valentin Hofmann, Goran Glavaš, Nikola Ljubešić, Janet B. Pierrehumbert, Hinrich Schütze
While pretrained language models (PLMs) have been shown to possess a plethora of linguistic knowledge, the existing body of research has largely neglected extralinguistic knowledge, which is generally difficult to obtain by pretraining on text alone.
1 code implementation • Findings (NAACL) 2022 • Valentin Hofmann, Xiaowen Dong, Janet B. Pierrehumbert, Hinrich Schütze
The increasing polarization of online political discourse calls for computational tools that automatically detect and monitor ideological divides in social media.
1 code implementation • ACL 2021 • Valentin Hofmann, Janet B. Pierrehumbert, Hinrich Schütze
How does the input segmentation of pretrained language models (PLMs) affect their interpretations of complex words?
1 code implementation • ACL 2021 • Valentin Hofmann, Janet B. Pierrehumbert, Hinrich Schütze
Static word embeddings that represent words by a single vector cannot capture the variability of word meaning in different linguistic and extralinguistic contexts.
no code implementations • ACL 2020 • Valentin Hofmann, Janet Pierrehumbert, Hinrich Sch{\"u}tze
We present the first study that examines the evolution of morphological families, i. e., sets of morphologically related words such as {``}trump{''}, {``}antitrumpism{''}, and {``}detrumpify{''}, in social media.
no code implementations • ACL 2020 • Valentin Hofmann, Hinrich Sch{\"u}tze, Janet Pierrehumbert
The auto-encoder models MWF in English surprisingly well by combining syntactic and semantic information with associative information from the mental lexicon.
1 code implementation • EMNLP 2020 • Valentin Hofmann, Janet B. Pierrehumbert, Hinrich Schütze
Can pretrained language models (PLMs) generate derivationally complex words?