no code implementations • 6 Jan 2021 • Kaige Yang
In this setting, the MDP dynamic is a good knowledge to transfer, which can be inferred by uniformly random policy.
no code implementations • 4 Jun 2020 • Kaige Yang, Laura Toni
Theoretically, we show that the proposed algorithm achieves a $\tilde{\mathcal{O}}(\hat{\beta}\sqrt{dT})$ upper bound of $T$-round regret, where $d$ is the dimension of arm features and $\hat{\beta}$ is the learned size of confidence bound.
no code implementations • 12 Jul 2019 • Kaige Yang, Xiaowen Dong, Laura Toni
In terms of network regret (sum of cumulative regret over $n$ users), the proposed algorithm leads to a scaling as $\tilde{\mathcal{O}}(\Psi d\sqrt{nT})$, which is a significant improvement over $\tilde{\mathcal{O}}(nd\sqrt{T})$ in the state-of-the-art algorithm \algo{Gob. Lin} \Ccite{cesa2013gang}.
no code implementations • 11 Feb 2019 • Kaige Yang, Xiaowen Dong, Laura Toni
We provide a theoretical analysis of the representation learning problem aimed at learning the latent variables (design matrix) $\Theta$ of observations $Y$ with the knowledge of the coefficient matrix $X$.
no code implementations • 3 Dec 2018 • Hoang Dung Vu, Kok Soon Chai, Bryan Keating, Nurislam Tursynbek, Boyan Xu, Kaige Yang, Xiaoyan Yang, Zhenjie Zhang
Refrigeration and chiller optimization is an important and well studied topic in mechanical engineering, mostly taking advantage of physical models, designed on top of over-simplified assumptions, over the equipments.
no code implementations • 31 Jul 2018 • Kaige Yang, Laura Toni
In this work, we study recommendation systems modelled as contextual multi-armed bandit (MAB) problems.