no code implementations • 4 May 2020 • Djallel Bouneffouf, Emmanuelle Claeys
We study here the problem of learning the exploration exploitation trade-off in the contextual bandit problem with linear reward function setting.