no code implementations • 16 Feb 2021 • Nima Hamidi, Mohsen Bayati
The elliptical potential lemma is a key tool for quantifying uncertainty in estimating parameters of the reward function, but it requires the noise and the prior distributions to be Gaussian.
1 code implementation • NeurIPS 2020 • Mohsen Bayati, Nima Hamidi, Ramesh Johari, Khashayar Khosravi
We study the structure of regret-minimizing policies in the {\em many-armed} Bayesian multi-armed bandit problem: in particular, with $k$ the number of arms and $T$ the time horizon, we consider the case where $k \geq \sqrt{T}$.
no code implementations • 11 Jun 2020 • Nima Hamidi, Mohsen Bayati
This paper studies the stochastic linear bandit problem, where a decision-maker chooses actions from possibly time-dependent sets of vectors in $\mathbb{R}^d$ and receives noisy rewards.
2 code implementations • 24 Feb 2020 • Mohsen Bayati, Nima Hamidi, Ramesh Johari, Khashayar Khosravi
This finding diverges from the notion of free exploration, which relates to covariate variation, as recently discussed in contextual bandit literature.
no code implementations • 12 Feb 2020 • Nima Hamidi, Mohsen Bayati
First, our new notion of optimism in expectation gives rise to a new algorithm, called sieved greedy (SG) that reduces the overexploration problem in OFUL.
no code implementations • NeurIPS 2019 • Nima Hamidi, Mohsen Bayati, Kapil Gupta
We consider the k-armed stochastic contextual bandit problem with d dimensional features, when both k and d can be large.
no code implementations • 16 Jun 2019 • Nima Hamidi, Mohsen Vahidzadeh, Stephen Baek
Convolutional neural networks (CNN) recently gained notable attraction in a variety of machine learning tasks: including music classification and style tagging.
1 code implementation • 18 Apr 2019 • Nima Hamidi, Mohsen Bayati
In this paper, we study the trace regression when a matrix of parameters B* is estimated via the convex relaxation of a rank-regularized regression or via regularized non-convex optimization.