no code implementations • 31 Jan 2024 • Adrien Bolland, Gaspard Lambrechts, Damien Ernst
To compute near-optimal policies, it is essential in practice to include exploration terms in the learning objective.
1 code implementation • 20 Jun 2023 • Gaspard Lambrechts, Adrien Bolland, Damien Ernst
We then show that this informed objective consists of learning an environment model from which we can sample latent trajectories.
no code implementations • 6 Aug 2022 • Gaspard Lambrechts, Adrien Bolland, Damien Ernst
In summary, this work shows that in its hidden states, a recurrent neural network approximating the Q-function of a partially observable environment reproduces a sufficient statistic from the history that is correlated to the relevant part of the belief for taking optimal actions.
no code implementations • 2 Jun 2021 • Gaspard Lambrechts, Florent De Geeter, Nicolas Vecoven, Damien Ernst, Guillaume Drion
This insight leads to the design of a novel way to initialise any recurrent cell connectivity through a procedure called "warmup" to improve its capability to learn arbitrarily long time dependencies.