In this research, some of the issues that arise from the scalarization of the multi-objective optimization problem in the Advantage Actor Critic (A2C) reinforcement learning algorithm are investigated. The paper shows how a naive scalarization can lead to gradients overlapping... (read more)
PDF AbstractMETHOD | TYPE | |
---|---|---|
![]() |
Regularization |