Search Results for author: Dylan J. Foster

Found 53 papers, 6 papers with code

The Power of Resets in Online Reinforcement Learning

no code implementations23 Apr 2024 Zakaria Mhammedi, Dylan J. Foster, Alexander Rakhlin

We use local simulator access to unlock new statistical guarantees that were previously out of reach: - We show that MDPs with low coverability Xie et al. 2023 -- a general structural condition that subsumes Block MDPs and Low-Rank MDPs -- can be learned in a sample-efficient fashion with only $Q^{\star}$-realizability (realizability of the optimal state-value function); existing online RL algorithms require significantly stronger representation conditions.

reinforcement-learning

Online Estimation via Offline Estimation: An Information-Theoretic Framework

no code implementations15 Apr 2024 Dylan J. Foster, Yanjun Han, Jian Qian, Alexander Rakhlin

Our main results settle the statistical and computational complexity of online estimation in this framework.

Decision Making Density Estimation

Can large language models explore in-context?

no code implementations22 Mar 2024 Akshay Krishnamurthy, Keegan Harris, Dylan J. Foster, Cyril Zhang, Aleksandrs Slivkins

We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making.

Decision Making

Scalable Online Exploration via Coverability

1 code implementation11 Mar 2024 Philip Amortila, Dylan J. Foster, Akshay Krishnamurthy

We propose exploration objectives -- policy optimization objectives that enable downstream maximization of any reward function -- as a conceptual framework to systematize the study of exploration.

Efficient Exploration Q-Learning +1

Harnessing Density Ratios for Online Reinforcement Learning

no code implementations18 Jan 2024 Philip Amortila, Dylan J. Foster, Nan Jiang, Ayush Sekhari, Tengyang Xie

The theories of offline and online reinforcement learning, despite having evolved in parallel, have begun to show signs of the possibility for a unification, with algorithms and analysis techniques for one setting often having natural counterparts in the other.

Offline RL reinforcement-learning

Foundations of Reinforcement Learning and Interactive Decision Making

no code implementations27 Dec 2023 Dylan J. Foster, Alexander Rakhlin

These lecture notes give a statistical perspective on the foundations of reinforcement learning and interactive decision making.

Decision Making Multi-Armed Bandits +1

Efficient Model-Free Exploration in Low-Rank MDPs

no code implementations NeurIPS 2023 Zakaria Mhammedi, Adam Block, Dylan J. Foster, Alexander Rakhlin

A major challenge in reinforcement learning is to develop practical, sample-efficient algorithms for exploration in high-dimensional domains where generalization and function approximation is required.

Representation Learning

Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory

no code implementations24 Apr 2023 Andrew Wagenmaker, Dylan J. Foster

We consider the development of adaptive, instance-dependent algorithms for interactive decision making (bandits, reinforcement learning, and beyond) that, rather than only performing well in the worst case, adapt to favorable properties of real-world instances for improved performance.

Decision Making reinforcement-learning

Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL

1 code implementation12 Apr 2023 Zakaria Mhammedi, Dylan J. Foster, Alexander Rakhlin

We address these issues by providing the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level, with minimal statistical assumptions.

Representation Learning

Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games

no code implementations22 Mar 2023 Dylan J. Foster, Noah Golowich, Sham M. Kakade

They are proven via lower bounds for a simpler problem we refer to as SparseCCE, in which the goal is to compute a coarse correlated equilibrium that is sparse in the sense that it can be represented as a mixture of a small number of product policies.

Computational Efficiency Multi-agent Reinforcement Learning

Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient

no code implementations19 Jan 2023 Dylan J. Foster, Noah Golowich, Yanjun Han

Recently, Foster et al. (2021) introduced the Decision-Estimation Coefficient (DEC), a measure of statistical complexity which leads to upper and lower bounds on the optimal sample complexity for a general class of problems encompassing bandits and reinforcement learning with function approximation.

Decision Making reinforcement-learning +1

Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression

no code implementations14 Nov 2022 Aleksandrs Slivkins, Karthik Abinav Sankararaman, Dylan J. Foster

We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption.

Multi-Armed Bandits regression

The Role of Coverage in Online Reinforcement Learning

no code implementations9 Oct 2022 Tengyang Xie, Dylan J. Foster, Yu Bai, Nan Jiang, Sham M. Kakade

Coverage conditions -- which assert that the data logging distribution adequately covers the state space -- play a fundamental role in determining the sample complexity of offline reinforcement learning.

Efficient Exploration Offline RL +2

Contextual Bandits with Large Action Spaces: Made Practical

1 code implementation12 Jul 2022 Yinglun Zhu, Dylan J. Foster, John Langford, Paul Mineiro

Focusing on the contextual bandit problem, recent progress provides provably efficient algorithms with strong empirical performance when the number of possible alternatives ("actions") is small, but guarantees for decision making in large, continuous action spaces have remained elusive, leading to a significant gap between theory and practice.

Decision Making Multi-Armed Bandits

On the Complexity of Adversarial Decision Making

no code implementations27 Jun 2022 Dylan J. Foster, Alexander Rakhlin, Ayush Sekhari, Karthik Sridharan

A central problem in online learning and decision making -- from bandits to reinforcement learning -- is to understand what modeling assumptions lead to sample-efficient learning guarantees.

Decision Making reinforcement-learning +1

Interaction-Grounded Learning with Action-inclusive Feedback

no code implementations16 Jun 2022 Tengyang Xie, Akanksha Saran, Dylan J. Foster, Lekan Molu, Ida Momennejad, Nan Jiang, Paul Mineiro, John Langford

Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies.

Brain Computer Interface

Sample-Efficient Reinforcement Learning in the Presence of Exogenous Information

no code implementations9 Jun 2022 Yonathan Efroni, Dylan J. Foster, Dipendra Misra, Akshay Krishnamurthy, John Langford

In real-world reinforcement learning applications the learner's observation space is ubiquitously high-dimensional with both relevant and irrelevant information about the task at hand.

reinforcement-learning Reinforcement Learning (RL)

The Statistical Complexity of Interactive Decision Making

no code implementations27 Dec 2021 Dylan J. Foster, Sham M. Kakade, Jian Qian, Alexander Rakhlin

The main result of this work provides a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning.

Decision Making reinforcement-learning +1

Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation

no code implementations21 Nov 2021 Dylan J. Foster, Akshay Krishnamurthy, David Simchi-Levi, Yunzong Xu

This led Chen and Jiang (2019) to conjecture that concentrability (the most standard notion of coverage) and realizability (the weakest representation condition) alone are not sufficient for sample-efficient offline RL.

Decision Making Offline RL +2

Adapting to Misspecification in Contextual Bandits

no code implementations NeurIPS 2020 Dylan J. Foster, Claudio Gentile, Mehryar Mohri, Julian Zimmert

Given access to an online oracle for square loss regression, our algorithm attains optimal regret and -- in particular -- optimal dependence on the misspecification level, with no prior knowledge.

Multi-Armed Bandits regression

Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

no code implementations NeurIPS 2021 Dylan J. Foster, Akshay Krishnamurthy

A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise, often quantified by the performance of the best hypothesis; such results are known as first-order or small-loss guarantees.

Decision Making Multi-Armed Bandits +1

Understanding the Eluder Dimension

no code implementations14 Apr 2021 Gene Li, Pritish Kamath, Dylan J. Foster, Nathan Srebro

We provide new insights on eluder dimension, a complexity measure that has been extensively used to bound the regret of algorithms for online bandits and reinforcement learning with function approximation.

Active Learning

Independent Policy Gradient Methods for Competitive Reinforcement Learning

no code implementations NeurIPS 2020 Constantinos Daskalakis, Dylan J. Foster, Noah Golowich

We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i. e., zero-sum stochastic games).

Policy Gradient Methods reinforcement-learning +1

Learning the Linear Quadratic Regulator from Nonlinear Observations

no code implementations NeurIPS 2020 Zakaria Mhammedi, Dylan J. Foster, Max Simchowitz, Dipendra Misra, Wen Sun, Akshay Krishnamurthy, Alexander Rakhlin, John Langford

We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class.

Continuous Control

Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

no code implementations7 Oct 2020 Dylan J. Foster, Alexander Rakhlin, David Simchi-Levi, Yunzong Xu

In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm.

Active Learning Multi-Armed Bandits +2

Tight Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance

no code implementations2 Jul 2020 Blair Bilodeau, Dylan J. Foster, Daniel M. Roy

We consider the classical problem of sequential probability assignment under logarithmic loss while competing against an arbitrary, potentially nonparametric class of experts.

Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

no code implementations24 Jun 2020 Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan

We design an algorithm which finds an $\epsilon$-approximate stationary point (with $\|\nabla F(x)\|\le \epsilon$) using $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed.

Second-order methods Stochastic Optimization

Open Problem: Model Selection for Contextual Bandits

no code implementations19 Jun 2020 Dylan J. Foster, Akshay Krishnamurthy, Haipeng Luo

In statistical learning, algorithms for model selection allow the learner to adapt to the complexity of the best hypothesis class in a sequence.

Model Selection Multi-Armed Bandits

Learning nonlinear dynamical systems from a single trajectory

no code implementations L4DC 2020 Dylan J. Foster, Alexander Rakhlin, Tuhin Sarkar

We introduce algorithms for learning nonlinear dynamical systems of the form $x_{t+1}=\sigma(\Theta^{\star}x_t)+\varepsilon_t$, where $\Theta^{\star}$ is a weight matrix, $\sigma$ is a nonlinear link function, and $\varepsilon_t$ is a mean-zero noise process.

Logarithmic Regret for Adversarial Online Control

no code implementations29 Feb 2020 Dylan J. Foster, Max Simchowitz

We introduce a new algorithm for online linear-quadratic control in a known system subject to adversarial disturbances.

Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles

no code implementations ICML 2020 Dylan J. Foster, Alexander Rakhlin

We characterize the minimax rates for contextual bandits with general, potentially nonparametric function classes, and show that our algorithm is minimax optimal whenever the oracle obtains the optimal rate for regression.

Multi-Armed Bandits regression

Naive Exploration is Optimal for Online LQR

no code implementations ICML 2020 Max Simchowitz, Dylan J. Foster

Our upper bound is attained by a simple variant of $\textit{{certainty equivalent control}}$, where the learner selects control inputs according to the optimal controller for their estimate of the system while injecting exploratory random noise.

Lower Bounds for Non-Convex Stochastic Optimization

no code implementations5 Dec 2019 Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan Srebro, Blake Woodworth

We lower bound the complexity of finding $\epsilon$-stationary points (with gradient norm at most $\epsilon$) using stochastic first-order methods.

Stochastic Optimization

$\ell_{\infty}$ Vector Contraction for Rademacher Complexity

no code implementations15 Nov 2019 Dylan J. Foster, Alexander Rakhlin

We show that the Rademacher complexity of any $\mathbb{R}^{K}$-valued function class composed with an $\ell_{\infty}$-Lipschitz function is bounded by the maximum Rademacher complexity of the restriction of the function class along each coordinate, times a factor of $\tilde{O}(\sqrt{K})$.

Model selection for contextual bandits

1 code implementation NeurIPS 2019 Dylan J. Foster, Akshay Krishnamurthy, Haipeng Luo

We work in the stochastic realizable setting with a sequence of nested linear policy classes of dimension $d_1 < d_2 < \ldots$, where the $m^\star$-th class contains the optimal policy, and we design an algorithm that achieves $\tilde{O}(T^{2/3}d^{1/3}_{m^\star})$ regret with no prior knowledge of the optimal dimension $d_{m^\star}$.

Model Selection Multi-Armed Bandits

Sum-of-squares meets square loss: Fast rates for agnostic tensor completion

no code implementations30 May 2019 Dylan J. Foster, Andrej Risteski

In agnostic tensor completion, we make no assumption on the rank of the unknown tensor, but attempt to predict unknown entries as well as the best rank-$r$ tensor.

Matrix Completion

Hypothesis Set Stability and Generalization

no code implementations NeurIPS 2019 Dylan J. Foster, Spencer Greenberg, Satyen Kale, Haipeng Luo, Mehryar Mohri, Karthik Sridharan

Our main result is a generalization bound for data-dependent hypothesis sets expressed in terms of a notion of hypothesis set stability and a notion of Rademacher complexity for data-dependent hypothesis sets that we introduce.

Distributed Learning with Sublinear Communication

no code implementations28 Feb 2019 Jayadev Acharya, Christopher De Sa, Dylan J. Foster, Karthik Sridharan

In distributed statistical learning, $N$ samples are split across $m$ machines and a learner wishes to use minimal communication to learn as well as if the examples were on a single machine.

Quantization

The Complexity of Making the Gradient Small in Stochastic Convex Optimization

no code implementations13 Feb 2019 Dylan J. Foster, Ayush Sekhari, Ohad Shamir, Nathan Srebro, Karthik Sridharan, Blake Woodworth

Notably, we show that in the global oracle/statistical learning model, only logarithmic dependence on smoothness is required to find a near-stationary point, whereas polynomial dependence on smoothness is necessary in the local stochastic oracle model.

Stochastic Optimization

Orthogonal Statistical Learning

3 code implementations25 Jan 2019 Dylan J. Foster, Vasilis Syrgkanis

We provide non-asymptotic excess risk guarantees for statistical learning in a setting where the population risk with respect to which we evaluate the target parameter depends on an unknown nuisance parameter that must be estimated from data.

Domain Adaptation

Uniform Convergence of Gradients for Non-Convex Learning and Optimization

no code implementations NeurIPS 2018 Dylan J. Foster, Ayush Sekhari, Karthik Sridharan

We investigate 1) the rate at which refined properties of the empirical risk---in particular, gradients---converge to their population counterparts in standard non-convex learning tasks, and 2) the consequences of this convergence for optimization.

Logistic Regression: The Importance of Being Improper

no code implementations25 Mar 2018 Dylan J. Foster, Satyen Kale, Haipeng Luo, Mehryar Mohri, Karthik Sridharan

Starting with the simple observation that the logistic loss is $1$-mixable, we design a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm.

regression

Online Learning: Sufficient Statistics and the Burkholder Method

no code implementations20 Mar 2018 Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan

We uncover a fairly general principle in online learning: If regret can be (approximately) expressed as a function of certain "sufficient statistics" for the data sequence, then there exists a special Burkholder function that 1) can be used algorithmically to achieve the regret bound and 2) only depends on these sufficient statistics, not the entire data sequence, so that the online strategy is only required to keep the sufficient statistics in memory.

Practical Contextual Bandits with Regression Oracles

no code implementations ICML 2018 Dylan J. Foster, Alekh Agarwal, Miroslav Dudík, Haipeng Luo, Robert E. Schapire

A major challenge in contextual bandits is to design general-purpose algorithms that are both practically useful and theoretically well-founded.

General Classification Multi-Armed Bandits +1

Parameter-free online learning via model selection

no code implementations NeurIPS 2017 Dylan J. Foster, Satyen Kale, Mehryar Mohri, Karthik Sridharan

We introduce an efficient algorithmic framework for model selection in online learning, also known as parameter-free online learning.

Model Selection

Spectrally-normalized margin bounds for neural networks

1 code implementation NeurIPS 2017 Peter Bartlett, Dylan J. Foster, Matus Telgarsky

This paper presents a margin-based multiclass generalization bound for neural networks that scales with their margin-normalized "spectral complexity": their Lipschitz constant, meaning the product of the spectral norms of the weight matrices, times a certain correction factor.

ZigZag: A new approach to adaptive online learning

no code implementations13 Apr 2017 Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan

To develop a general theory of when this type of adaptive regret bound is achievable we establish a connection to the theory of decoupling inequalities for martingales in Banach spaces.

Inference in Sparse Graphs with Pairwise Measurements and Side Information

no code implementations8 Mar 2017 Dylan J. Foster, Daniel Reichman, Karthik Sridharan

For two-dimensional grids, our results improve over Globerson et al. (2015) by obtaining optimal recovery in the constant-height regime.

Learning Theory Tree Decomposition

Learning in Games: Robustness of Fast Convergence

no code implementations NeurIPS 2016 Dylan J. Foster, Zhiyuan Li, Thodoris Lykouris, Karthik Sridharan, Eva Tardos

We show that learning algorithms satisfying a $\textit{low approximate regret}$ property experience fast convergence to approximate optimality in a large class of repeated games.

Adaptive Online Learning

no code implementations NeurIPS 2015 Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan

We propose a general framework for studying adaptive regret bounds in the online learning framework, including model selection bounds and data-dependent bounds.

Model Selection

Cannot find the paper you are looking for? You can Submit a new open access paper.