Search Results for author: Jelena Luketina

Found 7 papers, 3 papers with code

Understanding the Effects of RLHF on LLM Generalisation and Diversity

1 code implementation10 Oct 2023 Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis, Jelena Luketina, Eric Hambro, Edward Grefenstette, Roberta Raileanu

OOD generalisation is crucial given the wide range of real-world scenarios in which these models are being used, while output diversity refers to the model's ability to generate varied outputs and is important for a variety of use cases.

Instruction Following

Meta-Gradients in Non-Stationary Environments

no code implementations13 Sep 2022 Jelena Luketina, Sebastian Flennerhag, Yannick Schroecker, David Abel, Tom Zahavy, Satinder Singh

We support these results with a qualitative analysis of resulting meta-parameter schedules and learned functions of context features.

WordCraft: An Environment for Benchmarking Commonsense Agents

1 code implementation ICML Workshop LaReL 2020 Minqi Jiang, Jelena Luketina, Nantas Nardelli, Pasquale Minervini, Philip H. S. Torr, Shimon Whiteson, Tim Rocktäschel

This is partly due to the lack of lightweight simulation environments that sufficiently reflect the semantics of the real world and provide knowledge sources grounded with respect to observations in an RL environment.

Benchmarking Knowledge Graphs +2

A Survey of Reinforcement Learning Informed by Natural Language

no code implementations10 Jun 2019 Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, Tim Rocktäschel

To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand.

Decision Making Instruction Following +5

Progress & Compress: A scalable framework for continual learning

no code implementations ICML 2018 Jonathan Schwarz, Jelena Luketina, Wojciech M. Czarnecki, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu, Raia Hadsell

This is achieved by training a network with two components: A knowledge base, capable of solving previously encountered problems, which is connected to an active column that is employed to efficiently learn the current task.

Active Learning Atari Games +1

Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters

1 code implementation20 Nov 2015 Jelena Luketina, Mathias Berglund, Klaus Greff, Tapani Raiko

Hyperparameter selection generally relies on running multiple full training trials, with selection based on validation set performance.

Hyperparameter Optimization

Cannot find the paper you are looking for? You can Submit a new open access paper.