no code implementations • 23 Apr 2024 • Zakaria Mhammedi, Dylan J. Foster, Alexander Rakhlin
We use local simulator access to unlock new statistical guarantees that were previously out of reach: - We show that MDPs with low coverability Xie et al. 2023 -- a general structural condition that subsumes Block MDPs and Low-Rank MDPs -- can be learned in a sample-efficient fashion with only $Q^{\star}$-realizability (realizability of the optimal state-value function); existing online RL algorithms require significantly stronger representation conditions.
no code implementations • 15 Apr 2024 • Dylan J. Foster, Yanjun Han, Jian Qian, Alexander Rakhlin
Our main results settle the statistical and computational complexity of online estimation in this framework.
no code implementations • 22 Mar 2024 • Akshay Krishnamurthy, Keegan Harris, Dylan J. Foster, Cyril Zhang, Aleksandrs Slivkins
We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making.
1 code implementation • 11 Mar 2024 • Philip Amortila, Dylan J. Foster, Akshay Krishnamurthy
We propose exploration objectives -- policy optimization objectives that enable downstream maximization of any reward function -- as a conceptual framework to systematize the study of exploration.
no code implementations • 18 Jan 2024 • Philip Amortila, Dylan J. Foster, Nan Jiang, Ayush Sekhari, Tengyang Xie
The theories of offline and online reinforcement learning, despite having evolved in parallel, have begun to show signs of the possibility for a unification, with algorithms and analysis techniques for one setting often having natural counterparts in the other.
no code implementations • 27 Dec 2023 • Dylan J. Foster, Alexander Rakhlin
These lecture notes give a statistical perspective on the foundations of reinforcement learning and interactive decision making.
no code implementations • 17 Oct 2023 • Adam Block, Dylan J. Foster, Akshay Krishnamurthy, Max Simchowitz, Cyril Zhang
This work studies training instabilities of behavior cloning with deep neural networks.
no code implementations • NeurIPS 2023 • Zakaria Mhammedi, Adam Block, Dylan J. Foster, Alexander Rakhlin
A major challenge in reinforcement learning is to develop practical, sample-efficient algorithms for exploration in high-dimensional domains where generalization and function approximation is required.
no code implementations • 1 May 2023 • Dylan J. Foster, Dean P. Foster, Noah Golowich, Alexander Rakhlin
Compared to the best results for the single-agent setting, our bounds have additional gaps.
no code implementations • 24 Apr 2023 • Andrew Wagenmaker, Dylan J. Foster
We consider the development of adaptive, instance-dependent algorithms for interactive decision making (bandits, reinforcement learning, and beyond) that, rather than only performing well in the worst case, adapt to favorable properties of real-world instances for improved performance.
1 code implementation • 12 Apr 2023 • Zakaria Mhammedi, Dylan J. Foster, Alexander Rakhlin
We address these issues by providing the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level, with minimal statistical assumptions.
no code implementations • 22 Mar 2023 • Dylan J. Foster, Noah Golowich, Sham M. Kakade
They are proven via lower bounds for a simpler problem we refer to as SparseCCE, in which the goal is to compute a coarse correlated equilibrium that is sparse in the sense that it can be represented as a mixture of a small number of product policies.
no code implementations • 19 Jan 2023 • Dylan J. Foster, Noah Golowich, Yanjun Han
Recently, Foster et al. (2021) introduced the Decision-Estimation Coefficient (DEC), a measure of statistical complexity which leads to upper and lower bounds on the optimal sample complexity for a general class of problems encompassing bandits and reinforcement learning with function approximation.
no code implementations • 14 Nov 2022 • Aleksandrs Slivkins, Karthik Abinav Sankararaman, Dylan J. Foster
We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption.
no code implementations • 9 Oct 2022 • Tengyang Xie, Dylan J. Foster, Yu Bai, Nan Jiang, Sham M. Kakade
Coverage conditions -- which assert that the data logging distribution adequately covers the state space -- play a fundamental role in determining the sample complexity of offline reinforcement learning.
1 code implementation • 12 Jul 2022 • Yinglun Zhu, Dylan J. Foster, John Langford, Paul Mineiro
Focusing on the contextual bandit problem, recent progress provides provably efficient algorithms with strong empirical performance when the number of possible alternatives ("actions") is small, but guarantees for decision making in large, continuous action spaces have remained elusive, leading to a significant gap between theory and practice.
no code implementations • 27 Jun 2022 • Dylan J. Foster, Alexander Rakhlin, Ayush Sekhari, Karthik Sridharan
A central problem in online learning and decision making -- from bandits to reinforcement learning -- is to understand what modeling assumptions lead to sample-efficient learning guarantees.
no code implementations • 16 Jun 2022 • Tengyang Xie, Akanksha Saran, Dylan J. Foster, Lekan Molu, Ida Momennejad, Nan Jiang, Paul Mineiro, John Langford
Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies.
no code implementations • 9 Jun 2022 • Yonathan Efroni, Dylan J. Foster, Dipendra Misra, Akshay Krishnamurthy, John Langford
In real-world reinforcement learning applications the learner's observation space is ubiquitously high-dimensional with both relevant and irrelevant information about the task at hand.
no code implementations • 27 Dec 2021 • Dylan J. Foster, Sham M. Kakade, Jian Qian, Alexander Rakhlin
The main result of this work provides a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning.
no code implementations • 21 Nov 2021 • Dylan J. Foster, Akshay Krishnamurthy, David Simchi-Levi, Yunzong Xu
This led Chen and Jiang (2019) to conjecture that concentrability (the most standard notion of coverage) and realizability (the weakest representation condition) alone are not sufficient for sample-efficient offline RL.
no code implementations • NeurIPS 2020 • Dylan J. Foster, Claudio Gentile, Mehryar Mohri, Julian Zimmert
Given access to an online oracle for square loss regression, our algorithm attains optimal regret and -- in particular -- optimal dependence on the misspecification level, with no prior knowledge.
no code implementations • NeurIPS 2021 • Dylan J. Foster, Akshay Krishnamurthy
A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise, often quantified by the performance of the best hypothesis; such results are known as first-order or small-loss guarantees.
no code implementations • 14 Apr 2021 • Gene Li, Pritish Kamath, Dylan J. Foster, Nathan Srebro
We provide new insights on eluder dimension, a complexity measure that has been extensively used to bound the regret of algorithms for online bandits and reinforcement learning with function approximation.
no code implementations • NeurIPS 2020 • Constantinos Daskalakis, Dylan J. Foster, Noah Golowich
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i. e., zero-sum stochastic games).
no code implementations • NeurIPS 2020 • Zakaria Mhammedi, Dylan J. Foster, Max Simchowitz, Dipendra Misra, Wen Sun, Akshay Krishnamurthy, Alexander Rakhlin, John Langford
We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class.
no code implementations • 7 Oct 2020 • Dylan J. Foster, Alexander Rakhlin, David Simchi-Levi, Yunzong Xu
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm.
no code implementations • 2 Jul 2020 • Blair Bilodeau, Dylan J. Foster, Daniel M. Roy
We consider the classical problem of sequential probability assignment under logarithmic loss while competing against an arbitrary, potentially nonparametric class of experts.
no code implementations • 24 Jun 2020 • Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan
We design an algorithm which finds an $\epsilon$-approximate stationary point (with $\|\nabla F(x)\|\le \epsilon$) using $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed.
no code implementations • 19 Jun 2020 • Dylan J. Foster, Akshay Krishnamurthy, Haipeng Luo
In statistical learning, algorithms for model selection allow the learner to adapt to the complexity of the best hypothesis class in a sequence.
no code implementations • L4DC 2020 • Dylan J. Foster, Alexander Rakhlin, Tuhin Sarkar
We introduce algorithms for learning nonlinear dynamical systems of the form $x_{t+1}=\sigma(\Theta^{\star}x_t)+\varepsilon_t$, where $\Theta^{\star}$ is a weight matrix, $\sigma$ is a nonlinear link function, and $\varepsilon_t$ is a mean-zero noise process.
no code implementations • 29 Feb 2020 • Dylan J. Foster, Max Simchowitz
We introduce a new algorithm for online linear-quadratic control in a known system subject to adversarial disturbances.
no code implementations • ICML 2020 • Dylan J. Foster, Alexander Rakhlin
We characterize the minimax rates for contextual bandits with general, potentially nonparametric function classes, and show that our algorithm is minimax optimal whenever the oracle obtains the optimal rate for regression.
no code implementations • ICML 2020 • Max Simchowitz, Dylan J. Foster
Our upper bound is attained by a simple variant of $\textit{{certainty equivalent control}}$, where the learner selects control inputs according to the optimal controller for their estimate of the system while injecting exploratory random noise.
no code implementations • 5 Dec 2019 • Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan Srebro, Blake Woodworth
We lower bound the complexity of finding $\epsilon$-stationary points (with gradient norm at most $\epsilon$) using stochastic first-order methods.
no code implementations • 15 Nov 2019 • Dylan J. Foster, Alexander Rakhlin
We show that the Rademacher complexity of any $\mathbb{R}^{K}$-valued function class composed with an $\ell_{\infty}$-Lipschitz function is bounded by the maximum Rademacher complexity of the restriction of the function class along each coordinate, times a factor of $\tilde{O}(\sqrt{K})$.
1 code implementation • NeurIPS 2019 • Dylan J. Foster, Akshay Krishnamurthy, Haipeng Luo
We work in the stochastic realizable setting with a sequence of nested linear policy classes of dimension $d_1 < d_2 < \ldots$, where the $m^\star$-th class contains the optimal policy, and we design an algorithm that achieves $\tilde{O}(T^{2/3}d^{1/3}_{m^\star})$ regret with no prior knowledge of the optimal dimension $d_{m^\star}$.
no code implementations • 30 May 2019 • Dylan J. Foster, Andrej Risteski
In agnostic tensor completion, we make no assumption on the rank of the unknown tensor, but attempt to predict unknown entries as well as the best rank-$r$ tensor.
no code implementations • NeurIPS 2019 • Dylan J. Foster, Spencer Greenberg, Satyen Kale, Haipeng Luo, Mehryar Mohri, Karthik Sridharan
Our main result is a generalization bound for data-dependent hypothesis sets expressed in terms of a notion of hypothesis set stability and a notion of Rademacher complexity for data-dependent hypothesis sets that we introduce.
no code implementations • 28 Feb 2019 • Jayadev Acharya, Christopher De Sa, Dylan J. Foster, Karthik Sridharan
In distributed statistical learning, $N$ samples are split across $m$ machines and a learner wishes to use minimal communication to learn as well as if the examples were on a single machine.
no code implementations • 13 Feb 2019 • Dylan J. Foster, Ayush Sekhari, Ohad Shamir, Nathan Srebro, Karthik Sridharan, Blake Woodworth
Notably, we show that in the global oracle/statistical learning model, only logarithmic dependence on smoothness is required to find a near-stationary point, whereas polynomial dependence on smoothness is necessary in the local stochastic oracle model.
3 code implementations • 25 Jan 2019 • Dylan J. Foster, Vasilis Syrgkanis
We provide non-asymptotic excess risk guarantees for statistical learning in a setting where the population risk with respect to which we evaluate the target parameter depends on an unknown nuisance parameter that must be estimated from data.
no code implementations • NeurIPS 2018 • Dylan J. Foster, Ayush Sekhari, Karthik Sridharan
We investigate 1) the rate at which refined properties of the empirical risk---in particular, gradients---converge to their population counterparts in standard non-convex learning tasks, and 2) the consequences of this convergence for optimization.
no code implementations • NeurIPS 2018 • Dylan J. Foster, Akshay Krishnamurthy
We use surrogate losses to obtain several new regret bounds and new algorithms for contextual bandit learning.
no code implementations • 25 Mar 2018 • Dylan J. Foster, Satyen Kale, Haipeng Luo, Mehryar Mohri, Karthik Sridharan
Starting with the simple observation that the logistic loss is $1$-mixable, we design a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm.
no code implementations • 20 Mar 2018 • Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan
We uncover a fairly general principle in online learning: If regret can be (approximately) expressed as a function of certain "sufficient statistics" for the data sequence, then there exists a special Burkholder function that 1) can be used algorithmically to achieve the regret bound and 2) only depends on these sufficient statistics, not the entire data sequence, so that the online strategy is only required to keep the sufficient statistics in memory.
no code implementations • ICML 2018 • Dylan J. Foster, Alekh Agarwal, Miroslav Dudík, Haipeng Luo, Robert E. Schapire
A major challenge in contextual bandits is to design general-purpose algorithms that are both practically useful and theoretically well-founded.
no code implementations • NeurIPS 2017 • Dylan J. Foster, Satyen Kale, Mehryar Mohri, Karthik Sridharan
We introduce an efficient algorithmic framework for model selection in online learning, also known as parameter-free online learning.
1 code implementation • NeurIPS 2017 • Peter Bartlett, Dylan J. Foster, Matus Telgarsky
This paper presents a margin-based multiclass generalization bound for neural networks that scales with their margin-normalized "spectral complexity": their Lipschitz constant, meaning the product of the spectral norms of the weight matrices, times a certain correction factor.
no code implementations • 13 Apr 2017 • Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan
To develop a general theory of when this type of adaptive regret bound is achievable we establish a connection to the theory of decoupling inequalities for martingales in Banach spaces.
no code implementations • 8 Mar 2017 • Dylan J. Foster, Daniel Reichman, Karthik Sridharan
For two-dimensional grids, our results improve over Globerson et al. (2015) by obtaining optimal recovery in the constant-height regime.
no code implementations • NeurIPS 2016 • Dylan J. Foster, Zhiyuan Li, Thodoris Lykouris, Karthik Sridharan, Eva Tardos
We show that learning algorithms satisfying a $\textit{low approximate regret}$ property experience fast convergence to approximate optimality in a large class of repeated games.
no code implementations • NeurIPS 2015 • Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan
We propose a general framework for studying adaptive regret bounds in the online learning framework, including model selection bounds and data-dependent bounds.