Search Results for author: Monte MacDiarmid

Found 3 papers, 2 papers with code

Understanding and Controlling a Maze-Solving Policy Network

no code implementations12 Oct 2023 Ulisse Mini, Peli Grietzer, Mrinank Sharma, Austin Meek, Monte MacDiarmid, Alexander Matt Turner

To understand the goals and goal representations of AI systems, we carefully study a pretrained reinforcement learning policy that solves mazes by navigating to a range of target squares.

Activation Addition: Steering Language Models Without Optimization

1 code implementation20 Aug 2023 Alexander Matt Turner, Lisa Thiergart, David Udell, Gavin Leech, Ulisse Mini, Monte MacDiarmid

We demonstrate ActAdd on GPT-2 on OpenWebText and ConceptNet, and replicate the effect on Llama-13B and GPT-J-6B.

Prompt Engineering

Cannot find the paper you are looking for? You can Submit a new open access paper.