Search Results for author: Ramana Kumar

Found 16 papers, 2 papers with code

Evaluating Frontier Models for Dangerous Capabilities

no code implementations • 20 Mar 2024 • Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victoria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Sarah Hodkinson, Heidi Howard, Tom Lieberum, Ramana Kumar, Maria Abi Raad, Albert Webson, Lewis Ho, Sharon Lin, Sebastian Farquhar, Marcus Hutter, Gregoire Deletang, Anian Ruoss, Seliem El-Sayed, Sasha Brown, Anca Dragan, Rohin Shah, Allan Dafoe, Toby Shevlane

To understand the risks posed by a new AI system, we must understand what it can and cannot do.

Paper
Add Code

Explaining grokking through circuit efficiency

no code implementations • 5 Sep 2023 • Vikrant Varma, Rohin Shah, Zachary Kenton, János Kramár, Ramana Kumar

One of the most surprising puzzles in neural network generalisation is grokking: a network with perfect training accuracy but poor generalisation will, upon further training, transition to perfect generalisation.

Paper
Add Code

Scaling Goal-based Exploration via Pruning Proto-goals

1 code implementation • 9 Feb 2023 • Akhil Bagaria, Ray Jiang, Ramana Kumar, Tom Schaul

One of the gnarliest challenges in reinforcement learning (RL) is exploration that scales to vast domains, where novelty-, or coverage-seeking behaviour falls short.

reinforcement-learning Reinforcement Learning (RL)

456

Paper
Code

Solving math word problems with process- and outcome-based feedback

no code implementations • 25 Nov 2022 • Jonathan Uesato, Nate Kushman, Ramana Kumar, Francis Song, Noah Siegel, Lisa Wang, Antonia Creswell, Geoffrey Irving, Irina Higgins

Recent work has shown that asking language models to generate reasoning steps improves performance on many reasoning tasks.

Ranked #31 on Arithmetic Reasoning on GSM8K (using extra training data)

Arithmetic Reasoning GSM8K +1

Paper
Add Code

Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals

no code implementations • 4 Oct 2022 • Rohin Shah, Vikrant Varma, Ramana Kumar, Mary Phuong, Victoria Krakovna, Jonathan Uesato, Zac Kenton

However, an AI system may pursue an undesired goal even when the specification is correct, in the case of goal misgeneralization.

Paper
Add Code

Discovering Agents

no code implementations • 17 Aug 2022 • Zachary Kenton, Ramana Kumar, Sebastian Farquhar, Jonathan Richens, Matt MacDermott, Tom Everitt

Causal models of agents have been used to analyse the safety aspects of machine learning systems.

Causal Discovery

Paper
Add Code

Safe Deep RL in 3D Environments using Human Feedback

no code implementations • 20 Jan 2022 • Matthew Rahtz, Vikrant Varma, Ramana Kumar, Zachary Kenton, Shane Legg, Jan Leike

In this paper we answer this question in the affirmative, using ReQueST to train an agent to perform a 3D first-person object collection task using data entirely from human contractors.

Paper
Add Code

Formal Methods for the Informal Engineer: Workshop Recommendations

no code implementations • 1 Apr 2021 • Gopal Sarma, James Koppel, Gregory Malecha, Patrick Schultz, Eric Drexler, Ramana Kumar, Cody Roux, Philip Zucker

Formal Methods for the Informal Engineer (FMIE) was a workshop held at the Broad Institute of MIT and Harvard in 2021 to explore the potential role of verified software in the biomedical software ecosystem.

Paper
Add Code

REALab: An Embedded Perspective on Tampering

no code implementations • 17 Nov 2020 • Ramana Kumar, Jonathan Uesato, Richard Ngo, Tom Everitt, Victoria Krakovna, Shane Legg

Standard Markov Decision Process (MDP) formulations of RL and simulated environments mirroring the MDP structure assume secure access to feedback (e. g., rewards).

Reinforcement Learning (RL)

Paper
Add Code

Avoiding Tampering Incentives in Deep RL via Decoupled Approval

no code implementations • 17 Nov 2020 • Jonathan Uesato, Ramana Kumar, Victoria Krakovna, Tom Everitt, Richard Ngo, Shane Legg

How can we design agents that pursue a given objective when all feedback mechanisms are influenceable by the agent?

Paper
Add Code

Scalable Neural Learning for Verifiable Consistency with Temporal Specifications

no code implementations • 25 Sep 2019 • Sumanth Dathathri, Johannes Welbl, Krishnamurthy (Dj) Dvijotham, Ramana Kumar, Aditya Kanade, Jonathan Uesato, Sven Gowal, Po-Sen Huang, Pushmeet Kohli

Formal verification of machine learning models has attracted attention recently, and significant progress has been made on proving simple properties like robustness to small perturbations of the input features.

Adversarial Robustness Language Modelling

Paper
Add Code

Reward Tampering Problems and Solutions in Reinforcement Learning: A Causal Influence Diagram Perspective

no code implementations • 13 Aug 2019 • Tom Everitt, Marcus Hutter, Ramana Kumar, Victoria Krakovna

Can humans get arbitrarily capable reinforcement learning (RL) agents to do their bidding?

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Modeling AGI Safety Frameworks with Causal Influence Diagrams

no code implementations • 20 Jun 2019 • Tom Everitt, Ramana Kumar, Victoria Krakovna, Shane Legg

Proposals for safe AGI systems are typically made at the level of frameworks, specifying how the components of the proposed system should be trained and interact with each other.

Paper
Add Code

Penalizing side effects using stepwise relative reachability

no code implementations • 4 Jun 2018 • Victoria Krakovna, Laurent Orseau, Ramana Kumar, Miljan Martic, Shane Legg

How can we design safe reinforcement learning agents that avoid unnecessary disruptions to their environment?

Safe Reinforcement Learning

Paper
Add Code

TacticToe: Learning to Prove with Tactics

no code implementations • 2 Apr 2018 • Thibault Gauthier, Cezary Kaliszyk, Josef Urban, Ramana Kumar, Michael Norrish

We implement a automated tactical prover TacticToe on top of the HOL4 interactive theorem prover.

Paper
Add Code

A Proof Strategy Language and Proof Script Generation for Isabelle/HOL

1 code implementation • 9 Jun 2016 • Yutaka Nagashima, Ramana Kumar

We introduce a language, PSL, designed to capture high level proof strategies in Isabelle/HOL.

Logic in Computer Science

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.