Search Results for author: Luke Marks

Found 2 papers, 0 papers with code

Informal Safety Guarantees for Simulated Optimizers Through Extrapolation from Partial Simulations

no code implementations • 29 Nov 2023 • Luke Marks

This variant leveraging scaling dimensionality is named the Cartesian object, and is used to represent simulations (where individual simulacra are the agents and devices in that object).

Language Modelling Object +1

Paper
Add Code

Beyond Training Objectives: Interpreting Reward Model Divergence in Large Language Models

no code implementations • 12 Oct 2023 • Luke Marks, Amir Abdullah, Clement Neo, Rauno Arike, Philip Torr, Fazl Barez

Large language models (LLMs) fine-tuned by reinforcement learning from human feedback (RLHF) are becoming more widely deployed.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.