The policy gradient theorem is defined based on an objective with respect to the initial distribution over states. In the discounted case, this results in policies that are optimal for one distribution over initial states, but may not be uniformly optimal for others, no matter where the agent starts from... (read more)
PDFMETHOD | TYPE | |
---|---|---|
![]() |
Regularization |