no code implementations • 1 Jan 2019 • Sai Kiran Narayanaswami, Nandan Sudarsanam, Balaraman Ravindran
Robust Policy Search is the problem of learning policies that do not degrade in performance when subject to unseen environment model parameters.
no code implementations • 14 Dec 2017 • Nandan Sudarsanam, Nishanth Kumar, Abhishek Sharma, Balaraman Ravindran
We present a comprehensive analysis of 50 interestingness measures and classify them in accordance with the two properties.
no code implementations • 9 Nov 2017 • Subhojyoti Mukherjee, K. P. Naveen, Nandan Sudarsanam, Balaraman Ravindran
We propose a novel variant of the UCB algorithm (referred to as Efficient-UCB-Variance (EUCBV)) for minimizing cumulative regret in the stochastic multi-armed bandit (MAB) setting.
no code implementations • 7 Apr 2017 • Subhojyoti Mukherjee, K. P. Naveen, Nandan Sudarsanam, Balaraman Ravindran
In this paper we propose the Augmented-UCB (AugUCB) algorithm for a fixed-budget version of the thresholding bandit problem (TBP), where the objective is to identify a set of arms whose quality is above a threshold.
no code implementations • 4 May 2016 • Nandan Sudarsanam, Balaraman Ravindran
One of the proposed methods, X-Random bootstrap, performs better than the baselines in-terms of cumulative regret across various degrees of noise and different number of trials.