Search Results for author: Francis Rhys Ward

Found 4 papers, 0 papers with code

The Reasons that Agents Act: Intention and Instrumental Goals

no code implementations11 Feb 2024 Francis Rhys Ward, Matt MacDermott, Francesco Belardinelli, Francesca Toni, Tom Everitt

In addition, we show how our definition relates to past concepts, including actual causality, and the notion of instrumental goals, which is a core idea in the literature on safe AI agents.

Philosophy

Honesty Is the Best Policy: Defining and Mitigating AI Deception

no code implementations NeurIPS 2023 Francis Rhys Ward, Francesco Belardinelli, Francesca Toni, Tom Everitt

There are a number of existing definitions of deception in the literature on game theory and symbolic AI, but there is no overarching theory of deception for learning agents in games.

Philosophy

Experiments with Detecting and Mitigating AI Deception

no code implementations26 Jun 2023 Ismail Sahbane, Francis Rhys Ward, C Henrik Åslund

How to detect and mitigate deceptive AI systems is an open problem for the field of safe and trustworthy AI.

Argumentative Reward Learning: Reasoning About Human Preferences

no code implementations28 Sep 2022 Francis Rhys Ward, Francesco Belardinelli, Francesca Toni

We define a novel neuro-symbolic framework, argumentative reward learning, which combines preference-based argumentation with existing approaches to reinforcement learning from human feedback.

reinforcement-learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.