1 code implementation • 31 Oct 2023 • Philipp Dahlinger, Philipp Becker, Maximilian Hüttenrauch, Gerhard Neumann
Before each update, it solves the trust region problem for an optimal step size, resulting in a more stable and faster optimization process.
no code implementations • 24 May 2022 • Maximilian Hüttenrauch, Gerhard Neumann
In contrast, stochastic optimizers that are motivated by policy gradients, such as the Model-based Relative Entropy Stochastic Search (MORE) algorithm, directly optimize the expected fitness function without the use of rankings.
1 code implementation • 17 Jul 2018 • Maximilian Hüttenrauch, Adrian Šošić, Gerhard Neumann
However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant.
no code implementations • 21 Sep 2017 • Maximilian Hüttenrauch, Adrian Šošić, Gerhard Neumann
Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents.
1 code implementation • 18 Sep 2017 • Maximilian Hüttenrauch, Adrian Šošić, Gerhard Neumann
Here, we follow a guided approach where a critic has central access to the global state during learning, which simplifies the policy evaluation problem from a reinforcement learning point of view.