2 code implementations • Conference on Neural Information Processing Systems Datasets and Benchmarks Track 2023 • Florian Felten, Lucas N. Alegre, Ann Nowé, Ana L. C. Bazzan, El-Ghazali Talbi, Grégoire Danoy, Bruno C. da Silva
Multi-objective reinforcement learning algorithms (MORL) extend standard reinforcement learning (RL) to scenarios where agents must optimize multiple---potentially conflicting---objectives, each represented by a distinct reward function.
2 code implementations • 18 Jan 2023 • Lucas N. Alegre, Ana L. C. Bazzan, Diederik M. Roijers, Ann Nowé, Bruno C. da Silva
Finally, we introduce a bound that characterizes the maximum utility loss (with respect to the optimal solution) incurred by the partial solutions computed by our method throughout learning.
2 code implementations • Benelux Conference on Artificial Intelligence BNAIC/BeNeLearn 2022 • Lucas N. Alegre, Florian Felten, El-Ghazali Talbi, Grégoire Danoy, Ann Nowé, Ana L. C. Bazzan, Bruno C. da Silva
We introduce MO-Gym, an extensible library containing a diverse set of multi-objective reinforcement learning environments.
1 code implementation • 22 Jun 2022 • Lucas N. Alegre, Ana L. C. Bazzan, Bruno C. da Silva
If reward functions are expressed linearly, and the agent has previously learned a set of policies for different tasks, successor features (SFs) can be exploited to combine such policies and identify reasonable solutions for new problems.
1 code implementation • 20 May 2021 • Lucas N. Alegre, Ana L. C. Bazzan, Bruno C. da Silva
Non-stationary environments are challenging for reinforcement learning algorithms.
no code implementations • 9 Apr 2020 • Lucas N. Alegre, Ana L. C. Bazzan, Bruno C. da Silva
In this paper we analyze the effects that different sources of non-stationarity have in a network of traffic signals, in which each signal is modeled as a learning agent.
no code implementations • 24 Nov 2017 • Francisco M. Garcia, Bruno C. da Silva, Philip S. Thomas
In this paper we consider the problem of how a reinforcement learning agent tasked with solving a set of related Markov decision processes can use knowledge acquired early in its lifetime to improve its ability to more rapidly solve novel, but related, tasks.