no code implementations • ICML 2018 • Ciara Pike-Burke, Shipra Agrawal, Csaba Szepesvari, Steffen Grunewalder
In this problem, when the player pulls an arm, a reward is generated, however it is not immediately observed.
no code implementations • 22 Feb 2017 • Steffen Grunewalder, Azadeh Khaleghi
The multi-armed restless bandit problem is studied in the case where the pay-off distributions are stationary $\varphi$-mixing.
no code implementations • 18 Jun 2012 • Steffen Grunewalder, Guy Lever, Luca Baldassarre, Massi Pontil, Arthur Gretton
For policy optimisation we compare with least-squares policy iteration where a Gaussian process is used for value function estimation.