no code implementations • 6 Mar 2024 • Karthik Sridharan, Seung Won Wilson Yoo
We consider the problem of online learning where the sequence of actions played by the learner must adhere to an unknown safety constraint at every round.
no code implementations • 24 Jul 2023 • Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu
We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward.
no code implementations • 13 Oct 2022 • Satyen Kale, Jason D. Lee, Chris De Sa, Ayush Sekhari, Karthik Sridharan
When these potentials further satisfy certain self-bounding properties, we show that they can be used to provide a convergence guarantee for Gradient Descent (GD) and SGD (even when the paths of GF and GD/SGD are quite far apart).
no code implementations • 27 Jun 2022 • Dylan J. Foster, Alexander Rakhlin, Ayush Sekhari, Karthik Sridharan
A central problem in online learning and decision making -- from bandits to reinforcement learning -- is to understand what modeling assumptions lead to sample-efficient learning guarantees.
no code implementations • 19 Jun 2022 • Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan
This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration.
no code implementations • NeurIPS 2021 • Satyen Kale, Ayush Sekhari, Karthik Sridharan
We show that there is an SCO problem such that GD with any step size and number of iterations can only learn at a suboptimal rate: at least $\widetilde{\Omega}(1/n^{5/12})$.
no code implementations • NeurIPS 2021 • Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan
In this work, we consider the more realistic setting of agnostic RL with rich observation spaces and a fixed class of policies $\Pi$ that may not contain any near-optimal policy.
no code implementations • NeurIPS 2020 • Kush Bhatia, Karthik Sridharan
In this setting, we study the problem of minimizing policy regret and provide non-constructive upper bounds on the minimax rate for the problem.
no code implementations • 24 Jun 2020 • Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan
We design an algorithm which finds an $\epsilon$-approximate stationary point (with $\|\nabla F(x)\|\le \epsilon$) using $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed.
no code implementations • NeurIPS 2020 • Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan
We study episodic reinforcement learning in Markov decision processes when the agent receives additional feedback per step in the form of several transition observations.
no code implementations • NeurIPS 2019 • Dylan J. Foster, Spencer Greenberg, Satyen Kale, Haipeng Luo, Mehryar Mohri, Karthik Sridharan
Our main result is a generalization bound for data-dependent hypothesis sets expressed in terms of a notion of hypothesis set stability and a notion of Rademacher complexity for data-dependent hypothesis sets that we introduce.
no code implementations • 28 Feb 2019 • Jayadev Acharya, Christopher De Sa, Dylan J. Foster, Karthik Sridharan
In distributed statistical learning, $N$ samples are split across $m$ machines and a learner wishes to use minimal communication to learn as well as if the examples were on a single machine.
no code implementations • 13 Feb 2019 • Dylan J. Foster, Ayush Sekhari, Ohad Shamir, Nathan Srebro, Karthik Sridharan, Blake Woodworth
Notably, we show that in the global oracle/statistical learning model, only logarithmic dependence on smoothness is required to find a near-stationary point, whereas polynomial dependence on smoothness is necessary in the local stochastic oracle model.
no code implementations • NeurIPS 2018 • Dylan J. Foster, Ayush Sekhari, Karthik Sridharan
We investigate 1) the rate at which refined properties of the empirical risk---in particular, gradients---converge to their population counterparts in standard non-convex learning tasks, and 2) the consequences of this convergence for optimization.
1 code implementation • 11 Sep 2018 • Andrew Cotter, Heinrich Jiang, Serena Wang, Taman Narayan, Maya Gupta, Seungil You, Karthik Sridharan
This new formulation leads to an algorithm that produces a stochastic classifier by playing a two-player non-zero-sum game solving for what we call a semi-coarse correlated equilibrium, which in turn corresponds to an approximately optimal and feasible solution to the constrained optimization problem.
1 code implementation • 29 Jun 2018 • Andrew Cotter, Maya Gupta, Heinrich Jiang, Nathan Srebro, Karthik Sridharan, Serena Wang, Blake Woodworth, Seungil You
Classifiers can be trained with data-dependent constraints to satisfy fairness goals, reduce churn, achieve a targeted false positive rate, or other policy goals.
1 code implementation • 17 Apr 2018 • Andrew Cotter, Heinrich Jiang, Karthik Sridharan
For both the proxy-Lagrangian and Lagrangian formulations, however, we prove that this classifier, instead of having unbounded size, can be taken to be a distribution over no more than m+1 models (where m is the number of constraints).
no code implementations • 25 Mar 2018 • Dylan J. Foster, Satyen Kale, Haipeng Luo, Mehryar Mohri, Karthik Sridharan
Starting with the simple observation that the logistic loss is $1$-mixable, we design a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm.
no code implementations • 20 Mar 2018 • Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan
We uncover a fairly general principle in online learning: If regret can be (approximately) expressed as a function of certain "sufficient statistics" for the data sequence, then there exists a special Burkholder function that 1) can be used algorithmically to achieve the regret bound and 2) only depends on these sufficient statistics, not the entire data sequence, so that the online strategy is only required to keep the sufficient statistics in memory.
no code implementations • NeurIPS 2017 • Dylan J. Foster, Satyen Kale, Mehryar Mohri, Karthik Sridharan
We introduce an efficient algorithmic framework for model selection in online learning, also known as parameter-free online learning.
no code implementations • 9 Nov 2017 • Thodoris Lykouris, Karthik Sridharan, Eva Tardos
We develop a black-box approach for such problems where the learner observes as feedback only losses of a subset of the actions that includes the selected action.
no code implementations • 13 Apr 2017 • Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan
To develop a general theory of when this type of adaptive regret bound is achievable we establish a connection to the theory of decoupling inequalities for martingales in Banach spaces.
no code implementations • 8 Mar 2017 • Dylan J. Foster, Daniel Reichman, Karthik Sridharan
For two-dimensional grids, our results improve over Globerson et al. (2015) by obtaining optimal recovery in the constant-height regime.
no code implementations • 31 Aug 2016 • Alexander Rakhlin, Karthik Sridharan
We revisit the elegant observation of T. Cover '65 which, perhaps, is not as well-known to the broader community as it should be.
no code implementations • NeurIPS 2016 • Dylan J. Foster, Zhiyuan Li, Thodoris Lykouris, Karthik Sridharan, Eva Tardos
We show that learning algorithms satisfying a $\textit{low approximate regret}$ property experience fast convergence to approximate optimality in a large class of repeated games.
no code implementations • 6 Feb 2016 • Alexander Rakhlin, Karthik Sridharan
We present efficient algorithms for the problem of contextual bandits with i. i. d.
no code implementations • NeurIPS 2016 • Zeyuan Allen-Zhu, Yang Yuan, Karthik Sridharan
The amount of data available in the world is growing faster than our ability to deal with it.
no code implementations • 17 Dec 2015 • Matt J. Kusner, Yu Sun, Karthik Sridharan, Kilian Q. Weinberger
Causal inference has the potential to have significant impact on medical research, prevention and control of diseases, and identifying factors that impact economic changes to name just a few.
no code implementations • 13 Oct 2015 • Alexander Rakhlin, Karthik Sridharan
We study an equivalence of (i) deterministic pathwise statements appearing in the online learning literature (termed \emph{regret bounds}), (ii) high-probability tail bounds for the supremum of a collection of martingales (of a specific form arising from uniform laws of large numbers for martingales), and (iii) in-expectation bounds for the supremum.
no code implementations • NeurIPS 2015 • Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan
We propose a general framework for studying adaptive regret bounds in the online learning framework, including model selection bounds and data-dependent bounds.
no code implementations • 4 Mar 2015 • Alexander Rakhlin, Karthik Sridharan
We study online prediction where regret of the algorithm is measured against a benchmark defined via evolving constraints.
no code implementations • 21 Feb 2015 • Tengyuan Liang, Alexander Rakhlin, Karthik Sridharan
We consider regression with square loss and general classes of functions without the boundedness assumption.
no code implementations • 29 Jan 2015 • Alexander Rakhlin, Karthik Sridharan
We analyze the problem of sequential probability assignment for binary outcomes with side information and logarithmic loss, where regret---or, redundancy---is measured with respect to a (possibly infinite) class of experts.
no code implementations • 26 Jan 2015 • Alexander Rakhlin, Karthik Sridharan
This paper establishes minimax rates for online regression with arbitrary classes of functions and general losses.
no code implementations • 26 Jan 2015 • Ali Jadbabaie, Alexander Rakhlin, Shahin Shahrampour, Karthik Sridharan
Recent literature on online learning has focused on developing adaptive algorithms that take advantage of a regularity of the sequence of observations, yet retain worst-case performance guarantees.
no code implementations • 11 Feb 2014 • Alexander Rakhlin, Karthik Sridharan
The optimal rates are shown to exhibit a phase transition analogous to the i. i. d./statistical learning case, studied in (Rakhlin, Sridharan, Tsybakov 2013).
no code implementations • NeurIPS 2013 • Alexander Rakhlin, Karthik Sridharan
We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences.
no code implementations • 6 Aug 2013 • Alexander Rakhlin, Karthik Sridharan, Alexandre B. Tsybakov
Furthermore, for $p\in(0, 2)$, the excess risk rate matches the behavior of the minimax risk of function estimation in regression problems under the well-specified model.
no code implementations • NeurIPS 2012 • Sasha Rakhlin, Ohad Shamir, Karthik Sridharan
We show a principled way of deriving online learning algorithms from a minimax analysis.
no code implementations • 18 Aug 2012 • Alexander Rakhlin, Karthik Sridharan
Variance and path-length bounds can be seen as particular examples of online learning with simple predictable sequences.
no code implementations • NeurIPS 2011 • Andrew Cotter, Ohad Shamir, Nati Srebro, Karthik Sridharan
Mini-batch algorithms have recently received significant attention as a way to speed-up stochastic convex optimization problems.
no code implementations • NeurIPS 2011 • Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari
We define the minimax value of a game where the adversary is restricted in his moves, capturing stochastic and non-stochastic assumptions on data.
no code implementations • NeurIPS 2011 • Nati Srebro, Karthik Sridharan, Ambuj Tewari
We show that for a general class of convex online learning problems, Mirror Descent can always achieve a (nearly) optimal regret guarantee.
no code implementations • NeurIPS 2010 • Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari
We develop a theory of online learning by defining several complexity measures.
no code implementations • NeurIPS 2010 • Nathan Srebro, Karthik Sridharan, Ambuj Tewari
We establish an excess risk bound of O(H R_n^2 + sqrt{H L*} R_n) for ERM with an H-smooth loss function and a hypothesis class with Rademacher complexity R_n, where L* is the best risk achievable by the hypothesis class.
no code implementations • 6 Jun 2010 • Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari
We consider the problem of sequential prediction and provide tools to study the minimax value of the associated game.
no code implementations • 31 Oct 2009 • Sham M. Kakade, Ohad Shamir, Karthik Sridharan, Ambuj Tewari
The versatility of exponential families, along with their attendant convexity properties, make them a popular and effective statistical model.
no code implementations • NeurIPS 2008 • Sham M. Kakade, Karthik Sridharan, Ambuj Tewari
We provide sharp bounds for Rademacher and Gaussian complexities of (constrained) linear classes.
no code implementations • NeurIPS 2008 • Karthik Sridharan, Shai Shalev-Shwartz, Nathan Srebro
We show that the empirical minimizer of a stochastic strongly convex objective, where the stochastic component is linear, converges to the population minimizer with rate $O(1/n)$.