We propose a reinforcement learning (RL) algorithm that uses mutual-information regularization to optimize a prior action distribution for better performance and exploration. Entropy-based regularization has previously been shown to improve both exploration and robustness in challenging sequential decision-making tasks... (read more)
PDFMETHOD | TYPE | |
---|---|---|
![]() |
Regularization |