Search Results for author: Erik Jenner

Found 8 papers, 4 papers with code

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

1 code implementation • 15 Apr 2024 • Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Yoshua Bengio, Danqi Chen, Samuel Albanie, Tegan Maharaj, Jakob Foerster, Florian Tramer, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger

This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs).

Paper
Code

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning

no code implementations • 27 Feb 2024 • Leon Lang, Davis Foote, Stuart Russell, Anca Dragan, Erik Jenner, Scott Emmons

Past analyses of reinforcement learning from human feedback (RLHF) assume that the human fully observes the environment.

Paper
Add Code

STARC: A General Framework For Quantifying Differences Between Reward Functions

no code implementations • 26 Sep 2023 • Joar Skalse, Lucy Farnik, Sumeet Ramesh Motwani, Erik Jenner, Adam Gleave, Alessandro Abate

This means that reward learning algorithms generally must be evaluated empirically, which is expensive, and that their failure modes are difficult to anticipate in advance.

Paper
Add Code

imitation: Clean Imitation Learning Implementations

2 code implementations • 22 Nov 2022 • Adam Gleave, Mohammad Taufeeque, Juan Rocamonde, Erik Jenner, Steven H. Wang, Sam Toyer, Maximilian Ernestus, Nora Belrose, Scott Emmons, Stuart Russell

imitation provides open-source implementations of imitation and reward learning algorithms in PyTorch.

Imitation Learning reinforcement-learning +1

1,140

Paper
Code

Calculus on MDPs: Potential Shaping as a Gradient

no code implementations • 20 Aug 2022 • Erik Jenner, Herke van Hoof, Adam Gleave

In reinforcement learning, different reward functions can be equivalent in terms of the optimal policies they induce.

Math

Paper
Add Code

Preprocessing Reward Functions for Interpretability

1 code implementation • 25 Mar 2022 • Erik Jenner, Adam Gleave

In many real-world applications, the reward function is too complex to be manually specified.

Paper
Code

Extensions of Karger's Algorithm: Why They Fail in Theory and How They Are Useful in Practice

no code implementations • ICCV 2021 • Erik Jenner, Enrique Fita Sanmartín, Fred A. Hamprecht

However, we then present a simple new algorithm for seeded segmentation / graph-based semi-supervised learning that is closely based on Karger's original algorithm, showing that for these problems, extensions of Karger's algorithm can be useful.

Gaussian Processes Image Segmentation +1

Paper
Add Code

Steerable Partial Differential Operators for Equivariant Neural Networks

4 code implementations • ICLR 2022 • Erik Jenner, Maurice Weiler

In deep learning, however, these maps are usually defined by convolutions with a kernel, whereas they are partial differential operators (PDOs) in physics.

582

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.