no code implementations • 5 Feb 2020 • Kavosh Asadi, Neev Parikh, Ronald E. Parr, George D. Konidaris, Michael L. Littman
We show that the maximum action-value with respect to a deep RBVF can be approximated easily and accurately.
2 code implementations • 2 Dec 2018 • Zhao Song, Ronald E. Parr, Lawrence Carin
The impact of softmax on the value function itself in reinforcement learning (RL) is often viewed as problematic because it leads to sub-optimal value (or Q) functions and interferes with the contraction properties of the Bellman operator.
no code implementations • NeurIPS 2016 • Jason Pazis, Ronald E. Parr, Jonathan P. How
We present the first application of the median of means in a PAC exploration algorithm for MDPs.
no code implementations • NeurIPS 2016 • Zhao Song, Ronald E. Parr, Xuejun Liao, Lawrence Carin
We then develop a supervised linear feature encoding method that is motivated by insights from linear value function approximation theory, as well as empirical successes from deep RL.