Search Results for author: Steffen Grunewalder

Bandits with Delayed, Aggregated Anonymous Feedback

In this problem, when the player pulls an arm, a reward is generated, however it is not immediately observed.

Paper
Add Code

The multi-armed restless bandit problem is studied in the case where the pay-off distributions are stationary $\varphi$-mixing.

Paper
Add Code

For policy optimisation we compare with least-squares policy iteration where a Gaussian process is used for value function estimation.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.