no code implementations • 22 Aug 2023 • David M. Bossens
Adversarial RCPG also formulates the worst-case dynamics based on the Lagrangian but learns this directly and incrementally as an adversarial policy through gradient descent rather than indirectly and abruptly through constrained optimisation on a sorted value list.
1 code implementation • 7 Dec 2022 • David M. Bossens, Philip S. Thomas
In off-policy reinforcement learning, a behaviour policy performs exploratory interactions with the environment to obtain state-action-reward samples which are then used to learn a target policy that optimises the expected return.
no code implementations • 5 Sep 2022 • David M. Bossens, Christine Evers
The challenge of language grounding is to fully understand natural language by grounding language in real-world referents.
no code implementations • 21 Apr 2022 • David M. Bossens, Sarvapali Ramchurn, Danesh Tarapore
Purpose of review: This paper reviews opportunities and challenges for decentralised control, change-detection, and learning in the context of resilient robot teams.
no code implementations • 14 Nov 2021 • David M. Bossens, Nicholas Bishop
Constrained Markov decision processes (CMDPs) can provide long-term safety constraints; however, the agent may violate the constraints in an effort to explore its environment.
1 code implementation • 8 Sep 2021 • David M. Bossens, Danesh Tarapore
To illuminate the elite solutions for a space of behaviours, QD algorithms require the definition of a suitable behaviour space.
1 code implementation • 3 Jun 2021 • David M. Bossens, Adam J. Sobey
A long-standing challenge in artificial intelligence is lifelong reinforcement learning, where learners are given many tasks in sequence and must transfer knowledge between tasks while avoiding catastrophic forgetting.
no code implementations • 21 May 2021 • David M. Bossens, Danesh Tarapore
In Quality-Diversity (QD) algorithms, which evolve a behaviourally diverse archive of high-performing solutions, the behaviour space is a difficult design choice that should be tailored to the target application.
1 code implementation • 21 Dec 2020 • David M. Bossens, Danesh Tarapore
We also investigate disturbances in the operating environment of the swarm, where the swarm has to adapt to drastic changes in the number of resources available in the environment, and to one of the robots behaving disruptively towards the rest of the swarm, with 30 unique conditions for each such perturbation.
1 code implementation • 4 Mar 2020 • David M. Bossens, Danesh Tarapore
To allow fault recovery from randomly injected faults to different robots in a swarm, a model-free approach may be preferable due to the accumulation of faults in models and the difficulty to predict the behaviour of neighbouring robots.