no code implementations • 31 Mar 2024 • Steven Bilaj, Sofien Dhouib, Setareh Maghsudi
We study the problem of meta-learning several contextual stochastic bandits tasks by leveraging their concentration around a low-dimensional affine subspace, which we learn via online principal component analysis to reduce the expected regret over the encountered bandits.
no code implementations • 26 Jul 2023 • Behzad Nourani-Koliji, Steven Bilaj, Amir Rezaei Balef, Setareh Maghsudi
In our nonstationary environment, variations in the base arms' distributions, causal relationships between rewards, or both, change the reward generation process.
no code implementations • 14 Nov 2022 • Steven Bilaj, Sofien Dhouib, Setareh Maghsudi
We consider the problem of contextual multi-armed bandits in the setting of hypothesis transfer learning.