Search Results for author: Yotam Wolf

Found 2 papers, 0 papers with code

Tradeoffs Between Alignment and Helpfulness in Language Models

no code implementations • 29 Jan 2024 • Yotam Wolf, Noam Wies, Dorin Shteyman, Binyamin Rothberg, Yoav Levine, Amnon Shashua

Representation engineering yields gains in alignment oriented tasks such as resistance to adversarial attacks and reduction of social biases, but was also shown to cause a decrease in the ability of the model to perform basic tasks.

Language Modelling

Paper
Add Code

Fundamental Limitations of Alignment in Large Language Models

no code implementations • 19 Apr 2023 • Yotam Wolf, Noam Wies, Oshri Avnery, Yoav Levine, Amnon Shashua

An important aspect in developing language models that interact with humans is aligning their behavior to be useful and unharmful for their human users.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.