1 code implementation • 6 Feb 2024 • Nora Belrose, Quintin Pope, Lucia Quirke, Alex Mallen, Xiaoli Fern
The distributional simplicity bias (DSB) posits that neural networks learn low-order moments of the data distribution first, before moving on to higher-order correlations.
1 code implementation • 2 Dec 2023 • Alex Mallen, Madeline Brumley, Julia Kharchenko, Nora Belrose
Eliciting Latent Knowledge (ELK) aims to find patterns in a capable neural network's activations that robustly track the true state of the world, especially in hard-to-verify cases where the model's output is untrusted.
1 code implementation • 2 Oct 2023 • Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, Dan Hendrycks
In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience.
Ranked #3 on Question Answering on TruthfulQA
1 code implementation • 20 Dec 2022 • Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, Hannaneh Hajishirzi
Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the limitations of relying solely on their parameters to encode a wealth of world knowledge.
1 code implementation • 18 Sep 2022 • Alex Mallen, Christoph A. Keller, J. Nathan Kutz
In many scenarios, it is necessary to monitor a complex system via a time-series of observations and determine when anomalous exogenous events have occurred so that relevant actions can be taken.
no code implementations • 10 Jun 2021 • Alex Mallen, Henning Lange, J. Nathan Kutz
Probabilistic forecasting of complex phenomena is paramount to various scientific disciplines and applications.