no code implementations • 28 Feb 2023 • Sham M. Kakade, Akshay Krishnamurthy, Gaurav Mahajan, Cyril Zhang
In this paper, we depart from this setup and consider an interactive access model, in which the algorithm can query for samples from the conditional distributions of the HMMs.
no code implementations • 25 Feb 2023 • Daniel Kane, Sihan Liu, Shachar Lovett, Gaurav Mahajan, Csaba Szepesvári, Gellért Weisz
The rewards in this game are chosen such that if the learner achieves large reward, then the learner's actions can be used to simulate solving a variant of 3-SAT, where (a) each variable shows up in a bounded number of clauses (b) if an instance has no solutions then it also has no solutions that satisfy more than (1-$\epsilon$)-fraction of clauses.
no code implementations • 13 Feb 2023 • Max Hopkins, Daniel M. Kane, Shachar Lovett, Gaurav Mahajan
We study a foundational variant of Valiant and Vapnik and Chervonenkis' Probably Approximately Correct (PAC)-Learning in which the adversary is restricted to a known family of marginal distributions $\mathscr{P}$.
no code implementations • 22 Feb 2022 • Sanjoy Dasgupta, Gaurav Mahajan, Geelon So
We prove asymptotic convergence for a general class of $k$-means algorithms performed over streaming data from a distribution: the centers asymptotically converge to the set of stationary points of the $k$-means cost function.
no code implementations • 11 Feb 2022 • Daniel Kane, Sihan Liu, Shachar Lovett, Gaurav Mahajan
In this work, we make progress on this open problem by presenting the first computational lower bound for RL with linear function approximation: unless NP=RP, no randomized polynomial time algorithm exists for deterministic transition MDPs with a constant number of actions and linear optimal value functions.
no code implementations • 11 Jan 2022 • Robi Bhattacharjee, Gaurav Mahajan
We consider a lifelong learning scenario in which a learner faces a neverending and arbitrary stream of facts and has to decide which ones to retain in its limited memory.
no code implementations • 8 Nov 2021 • Max Hopkins, Daniel M. Kane, Shachar Lovett, Gaurav Mahajan
The equivalence of realizable and agnostic learnability is a fundamental phenomenon in learning theory.
no code implementations • 19 Mar 2021 • Simon S. Du, Sham M. Kakade, Jason D. Lee, Shachar Lovett, Gaurav Mahajan, Wen Sun, Ruosong Wang
The framework incorporates nearly all existing models in which a polynomial sample complexity is achievable, and, notably, also includes new models, such as the Linear $Q^*/V^*$ model in which both the optimal $Q$-function and the optimal $V$-function are linear in some known feature space.
no code implementations • NeurIPS 2020 • Simon S. Du, Jason D. Lee, Gaurav Mahajan, Ruosong Wang
The current paper studies the problem of agnostic $Q$-learning with function approximation in deterministic systems where the optimal $Q$-function is approximable by a function in the class $\mathcal{F}$ with approximation error $\delta \ge 0$.
no code implementations • 23 Apr 2020 • Max Hopkins, Daniel M. Kane, Shachar Lovett, Gaurav Mahajan
Given a finite set $X \subset \mathbb{R}^d$ and a binary linear classifier $c: \mathbb{R}^d \to \{0, 1\}$, how many queries of the form $c(x)$ are required to learn the label of every point in $X$?
no code implementations • 17 Feb 2020 • Simon S. Du, Jason D. Lee, Gaurav Mahajan, Ruosong Wang
2) In conjunction with the lower bound in [Wen and Van Roy, NIPS 2013], our upper bound suggests that the sample complexity $\widetilde{\Theta}\left(\mathrm{dim}_E\right)$ is tight even in the agnostic setting.
no code implementations • 15 Jan 2020 • Max Hopkins, Daniel Kane, Shachar Lovett, Gaurav Mahajan
With the explosion of massive, widely available unlabeled data in the past years, finding label and time efficient, robust learning algorithms has become ever more important in theory and in practice.
no code implementations • 1 Aug 2019 • Alekh Agarwal, Sham M. Kakade, Jason D. Lee, Gaurav Mahajan
Policy gradient methods are among the most effective methods in challenging reinforcement learning problems with large state and/or action spaces.