Search Results for author: Lev McKinney

Found 2 papers, 1 papers with code

Eliciting Latent Predictions from Transformers with the Tuned Lens

2 code implementations14 Mar 2023 Nora Belrose, Zach Furman, Logan Smith, Danny Halawi, Igor Ostrovsky, Lev McKinney, Stella Biderman, Jacob Steinhardt

We analyze transformers from the perspective of iterative inference, seeking to understand how model predictions are refined layer by layer.

Language Modelling

On The Fragility of Learned Reward Functions

no code implementations9 Jan 2023 Lev McKinney, Yawen Duan, David Krueger, Adam Gleave

Our work focuses on demonstrating and studying the causes of these relearning failures in the domain of preference-based reward learning.

Continuous Control

Cannot find the paper you are looking for? You can Submit a new open access paper.