no code implementations • 15 Jun 2015 • Manjesh Kumar Hanawal, Venkatesh Saligrama, Michal Valko, R\' emi Munos
We consider stochastic sequential learning problems where the learner can observe the \textit{average reward of several actions}.