Search Results for author: Zhiyou Yang

Double Thompson Sampling in Finite stochastic Games

This algorithm achieves a total regret bound of $\tilde{\mathcal{O}}(D\sqrt{SAT})$in time horizon $T$ with $S$ states, $A$ actions and diameter $D$.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.