no code implementations • 20 Sep 2021 • Conor Newton, Ayalvadi Ganesh, Henry W. J. Reeve
In fact, our regret guarantee matches the asymptotically optimal rate achievable in the full communication setting.
no code implementations • 15 Jan 2020 • Ronshee Chawla, Abishek Sankararaman, Ayalvadi Ganesh, Sanjay Shakkottai
Agents use the communication medium to recommend only arm-IDs (not samples), and thus update the set of arms from which they play.
no code implementations • 4 Oct 2019 • Abishek Sankararaman, Ayalvadi Ganesh, Sanjay Shakkottai
Our setting consists of a large number of agents $n$ that collaboratively and simultaneously solve the same instance of $K$ armed MAB to minimize the average cumulative regret over all agents.