Search Results for author: Dylan J. Foster

Found 53 papers, 6 papers with code

The Power of Resets in Online Reinforcement Learning

no code implementations • 23 Apr 2024 • Zakaria Mhammedi, Dylan J. Foster, Alexander Rakhlin

We use local simulator access to unlock new statistical guarantees that were previously out of reach: - We show that MDPs with low coverability Xie et al. 2023 -- a general structural condition that subsumes Block MDPs and Low-Rank MDPs -- can be learned in a sample-efficient fashion with only $Q^{\star}$-realizability (realizability of the optimal state-value function); existing online RL algorithms require significantly stronger representation conditions.

reinforcement-learning

Paper
Add Code

Online Estimation via Offline Estimation: An Information-Theoretic Framework

no code implementations • 15 Apr 2024 • Dylan J. Foster, Yanjun Han, Jian Qian, Alexander Rakhlin

Our main results settle the statistical and computational complexity of online estimation in this framework.

Decision Making Density Estimation

Paper
Add Code

Can large language models explore in-context?

no code implementations • 22 Mar 2024 • Akshay Krishnamurthy, Keegan Harris, Dylan J. Foster, Cyril Zhang, Aleksandrs Slivkins

We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making.

Decision Making

Paper
Add Code

Scalable Online Exploration via Coverability

1 code implementation • 11 Mar 2024 • Philip Amortila, Dylan J. Foster, Akshay Krishnamurthy

We propose exploration objectives -- policy optimization objectives that enable downstream maximization of any reward function -- as a conceptual framework to systematize the study of exploration.

Efficient Exploration Q-Learning +1

Paper
Code

Harnessing Density Ratios for Online Reinforcement Learning

no code implementations • 18 Jan 2024 • Philip Amortila, Dylan J. Foster, Nan Jiang, Ayush Sekhari, Tengyang Xie

The theories of offline and online reinforcement learning, despite having evolved in parallel, have begun to show signs of the possibility for a unification, with algorithms and analysis techniques for one setting often having natural counterparts in the other.

Offline RL reinforcement-learning

Paper
Add Code

Foundations of Reinforcement Learning and Interactive Decision Making

no code implementations • 27 Dec 2023 • Dylan J. Foster, Alexander Rakhlin

These lecture notes give a statistical perspective on the foundations of reinforcement learning and interactive decision making.

Decision Making Multi-Armed Bandits +1

Paper
Add Code

Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning and Autoregression

no code implementations • 17 Oct 2023 • Adam Block, Dylan J. Foster, Akshay Krishnamurthy, Max Simchowitz, Cyril Zhang

This work studies training instabilities of behavior cloning with deep neural networks.

Continuous Control Text Generation

Paper
Add Code

Efficient Model-Free Exploration in Low-Rank MDPs

no code implementations • NeurIPS 2023 • Zakaria Mhammedi, Adam Block, Dylan J. Foster, Alexander Rakhlin

A major challenge in reinforcement learning is to develop practical, sample-efficient algorithms for exploration in high-dimensional domains where generalization and function approximation is required.

Representation Learning

Paper
Add Code

On the Complexity of Multi-Agent Decision Making: From Learning in Games to Partial Monitoring

no code implementations • 1 May 2023 • Dylan J. Foster, Dean P. Foster, Noah Golowich, Alexander Rakhlin

Compared to the best results for the single-agent setting, our bounds have additional gaps.

Decision Making Multi-agent Reinforcement Learning

Paper
Add Code

Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory

no code implementations • 24 Apr 2023 • Andrew Wagenmaker, Dylan J. Foster

We consider the development of adaptive, instance-dependent algorithms for interactive decision making (bandits, reinforcement learning, and beyond) that, rather than only performing well in the worst case, adapt to favorable properties of real-world instances for improved performance.

Decision Making reinforcement-learning

Paper
Add Code

Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL

1 code implementation • 12 Apr 2023 • Zakaria Mhammedi, Dylan J. Foster, Alexander Rakhlin

We address these issues by providing the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level, with minimal statistical assumptions.

Representation Learning

Paper
Code

Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games

no code implementations • 22 Mar 2023 • Dylan J. Foster, Noah Golowich, Sham M. Kakade

They are proven via lower bounds for a simpler problem we refer to as SparseCCE, in which the goal is to compute a coarse correlated equilibrium that is sparse in the sense that it can be represented as a mixture of a small number of product policies.

Computational Efficiency Multi-agent Reinforcement Learning

Paper
Add Code

Tight Guarantees for Interactive Decision Making with the Decision-Estimation Coefficient

no code implementations • 19 Jan 2023 • Dylan J. Foster, Noah Golowich, Yanjun Han

Recently, Foster et al. (2021) introduced the Decision-Estimation Coefficient (DEC), a measure of statistical complexity which leads to upper and lower bounds on the optimal sample complexity for a general class of problems encompassing bandits and reinforcement learning with function approximation.

Decision Making reinforcement-learning +1

Paper
Add Code

Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression

no code implementations • 14 Nov 2022 • Aleksandrs Slivkins, Karthik Abinav Sankararaman, Dylan J. Foster

We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption.

Multi-Armed Bandits regression

Paper
Add Code

The Role of Coverage in Online Reinforcement Learning

no code implementations • 9 Oct 2022 • Tengyang Xie, Dylan J. Foster, Yu Bai, Nan Jiang, Sham M. Kakade

Coverage conditions -- which assert that the data logging distribution adequately covers the state space -- play a fundamental role in determining the sample complexity of offline reinforcement learning.

Efficient Exploration Offline RL +2

Paper
Add Code

Contextual Bandits with Large Action Spaces: Made Practical

1 code implementation • 12 Jul 2022 • Yinglun Zhu, Dylan J. Foster, John Langford, Paul Mineiro

Focusing on the contextual bandit problem, recent progress provides provably efficient algorithms with strong empirical performance when the number of possible alternatives ("actions") is small, but guarantees for decision making in large, continuous action spaces have remained elusive, leading to a significant gap between theory and practice.

Decision Making Multi-Armed Bandits

Paper
Code

On the Complexity of Adversarial Decision Making

no code implementations • 27 Jun 2022 • Dylan J. Foster, Alexander Rakhlin, Ayush Sekhari, Karthik Sridharan

A central problem in online learning and decision making -- from bandits to reinforcement learning -- is to understand what modeling assumptions lead to sample-efficient learning guarantees.

Decision Making reinforcement-learning +1

Paper
Add Code

Interaction-Grounded Learning with Action-inclusive Feedback

no code implementations • 16 Jun 2022 • Tengyang Xie, Akanksha Saran, Dylan J. Foster, Lekan Molu, Ida Momennejad, Nan Jiang, Paul Mineiro, John Langford

Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies.

Brain Computer Interface

Paper
Add Code

Sample-Efficient Reinforcement Learning in the Presence of Exogenous Information

no code implementations • 9 Jun 2022 • Yonathan Efroni, Dylan J. Foster, Dipendra Misra, Akshay Krishnamurthy, John Langford

In real-world reinforcement learning applications the learner's observation space is ubiquitously high-dimensional with both relevant and irrelevant information about the task at hand.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

The Statistical Complexity of Interactive Decision Making

no code implementations • 27 Dec 2021 • Dylan J. Foster, Sham M. Kakade, Jian Qian, Alexander Rakhlin

The main result of this work provides a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning.

Decision Making reinforcement-learning +1

Paper
Add Code

Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation

no code implementations • 21 Nov 2021 • Dylan J. Foster, Akshay Krishnamurthy, David Simchi-Levi, Yunzong Xu

This led Chen and Jiang (2019) to conjecture that concentrability (the most standard notion of coverage) and realizability (the weakest representation condition) alone are not sufficient for sample-efficient offline RL.

Decision Making Offline RL +2

Paper
Add Code

Adapting to Misspecification in Contextual Bandits

no code implementations • NeurIPS 2020 • Dylan J. Foster, Claudio Gentile, Mehryar Mohri, Julian Zimmert

Given access to an online oracle for square loss regression, our algorithm attains optimal regret and -- in particular -- optimal dependence on the misspecification level, with no prior knowledge.

Multi-Armed Bandits regression

Paper
Add Code

Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

no code implementations • NeurIPS 2021 • Dylan J. Foster, Akshay Krishnamurthy

A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems with low noise, often quantified by the performance of the best hypothesis; such results are known as first-order or small-loss guarantees.

Decision Making Multi-Armed Bandits +1

Paper
Add Code

Understanding the Eluder Dimension

no code implementations • 14 Apr 2021 • Gene Li, Pritish Kamath, Dylan J. Foster, Nathan Srebro

We provide new insights on eluder dimension, a complexity measure that has been extensively used to bound the regret of algorithms for online bandits and reinforcement learning with function approximation.

Active Learning

Paper
Add Code

Independent Policy Gradient Methods for Competitive Reinforcement Learning

no code implementations • NeurIPS 2020 • Constantinos Daskalakis, Dylan J. Foster, Noah Golowich

We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i. e., zero-sum stochastic games).

Policy Gradient Methods reinforcement-learning +1

Paper
Add Code

Learning the Linear Quadratic Regulator from Nonlinear Observations

no code implementations • NeurIPS 2020 • Zakaria Mhammedi, Dylan J. Foster, Max Simchowitz, Dipendra Misra, Wen Sun, Akshay Krishnamurthy, Alexander Rakhlin, John Langford

We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class.

Continuous Control

Paper
Add Code

Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

no code implementations • 7 Oct 2020 • Dylan J. Foster, Alexander Rakhlin, David Simchi-Levi, Yunzong Xu

In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm.

Active Learning Multi-Armed Bandits +2

Paper
Add Code

Tight Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance

no code implementations • 2 Jul 2020 • Blair Bilodeau, Dylan J. Foster, Daniel M. Roy

We consider the classical problem of sequential probability assignment under logarithmic loss while competing against an arbitrary, potentially nonparametric class of experts.

Paper
Add Code

Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

no code implementations • 24 Jun 2020 • Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan

We design an algorithm which finds an $\epsilon$-approximate stationary point (with $\|\nabla F(x)\|\le \epsilon$) using $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed.

Second-order methods Stochastic Optimization

Paper
Add Code

Open Problem: Model Selection for Contextual Bandits

no code implementations • 19 Jun 2020 • Dylan J. Foster, Akshay Krishnamurthy, Haipeng Luo

In statistical learning, algorithms for model selection allow the learner to adapt to the complexity of the best hypothesis class in a sequence.

Model Selection Multi-Armed Bandits

Paper
Add Code

Learning nonlinear dynamical systems from a single trajectory

no code implementations • L4DC 2020 • Dylan J. Foster, Alexander Rakhlin, Tuhin Sarkar

We introduce algorithms for learning nonlinear dynamical systems of the form $x_{t+1}=\sigma(\Theta^{\star}x_t)+\varepsilon_t$, where $\Theta^{\star}$ is a weight matrix, $\sigma$ is a nonlinear link function, and $\varepsilon_t$ is a mean-zero noise process.

Paper
Add Code

Logarithmic Regret for Adversarial Online Control

no code implementations • 29 Feb 2020 • Dylan J. Foster, Max Simchowitz

We introduce a new algorithm for online linear-quadratic control in a known system subject to adversarial disturbances.

Paper
Add Code

Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles

no code implementations • ICML 2020 • Dylan J. Foster, Alexander Rakhlin

We characterize the minimax rates for contextual bandits with general, potentially nonparametric function classes, and show that our algorithm is minimax optimal whenever the oracle obtains the optimal rate for regression.

Multi-Armed Bandits regression

Paper
Add Code

Naive Exploration is Optimal for Online LQR

no code implementations • ICML 2020 • Max Simchowitz, Dylan J. Foster

Our upper bound is attained by a simple variant of $\textit{{certainty equivalent control}}$, where the learner selects control inputs according to the optimal controller for their estimate of the system while injecting exploratory random noise.

Paper
Add Code

Lower Bounds for Non-Convex Stochastic Optimization

no code implementations • 5 Dec 2019 • Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan Srebro, Blake Woodworth

We lower bound the complexity of finding $\epsilon$-stationary points (with gradient norm at most $\epsilon$) using stochastic first-order methods.

Stochastic Optimization

Paper
Add Code

$\ell_{\infty}$ Vector Contraction for Rademacher Complexity

no code implementations • 15 Nov 2019 • Dylan J. Foster, Alexander Rakhlin

We show that the Rademacher complexity of any $\mathbb{R}^{K}$-valued function class composed with an $\ell_{\infty}$-Lipschitz function is bounded by the maximum Rademacher complexity of the restriction of the function class along each coordinate, times a factor of $\tilde{O}(\sqrt{K})$.

Paper
Add Code

Model selection for contextual bandits

1 code implementation • NeurIPS 2019 • Dylan J. Foster, Akshay Krishnamurthy, Haipeng Luo

We work in the stochastic realizable setting with a sequence of nested linear policy classes of dimension $d_1 < d_2 < \ldots$, where the $m^\star$-th class contains the optimal policy, and we design an algorithm that achieves $\tilde{O}(T^{2/3}d^{1/3}_{m^\star})$ regret with no prior knowledge of the optimal dimension $d_{m^\star}$.

Model Selection Multi-Armed Bandits

Paper
Code

Sum-of-squares meets square loss: Fast rates for agnostic tensor completion

no code implementations • 30 May 2019 • Dylan J. Foster, Andrej Risteski

In agnostic tensor completion, we make no assumption on the rank of the unknown tensor, but attempt to predict unknown entries as well as the best rank-$r$ tensor.

Matrix Completion

Paper
Add Code

Hypothesis Set Stability and Generalization

no code implementations • NeurIPS 2019 • Dylan J. Foster, Spencer Greenberg, Satyen Kale, Haipeng Luo, Mehryar Mohri, Karthik Sridharan

Our main result is a generalization bound for data-dependent hypothesis sets expressed in terms of a notion of hypothesis set stability and a notion of Rademacher complexity for data-dependent hypothesis sets that we introduce.

Paper
Add Code

Distributed Learning with Sublinear Communication

no code implementations • 28 Feb 2019 • Jayadev Acharya, Christopher De Sa, Dylan J. Foster, Karthik Sridharan

In distributed statistical learning, $N$ samples are split across $m$ machines and a learner wishes to use minimal communication to learn as well as if the examples were on a single machine.

Quantization

Paper
Add Code

The Complexity of Making the Gradient Small in Stochastic Convex Optimization

no code implementations • 13 Feb 2019 • Dylan J. Foster, Ayush Sekhari, Ohad Shamir, Nathan Srebro, Karthik Sridharan, Blake Woodworth

Notably, we show that in the global oracle/statistical learning model, only logarithmic dependence on smoothness is required to find a near-stationary point, whereas polynomial dependence on smoothness is necessary in the local stochastic oracle model.

Stochastic Optimization

Paper
Add Code

Orthogonal Statistical Learning

3 code implementations • 25 Jan 2019 • Dylan J. Foster, Vasilis Syrgkanis

We provide non-asymptotic excess risk guarantees for statistical learning in a setting where the population risk with respect to which we evaluate the target parameter depends on an unknown nuisance parameter that must be estimated from data.

Domain Adaptation

3,550

Paper
Code

Uniform Convergence of Gradients for Non-Convex Learning and Optimization

no code implementations • NeurIPS 2018 • Dylan J. Foster, Ayush Sekhari, Karthik Sridharan

We investigate 1) the rate at which refined properties of the empirical risk---in particular, gradients---converge to their population counterparts in standard non-convex learning tasks, and 2) the consequences of this convergence for optimization.

Paper
Add Code

Contextual bandits with surrogate losses: Margin bounds and efficient algorithms

no code implementations • NeurIPS 2018 • Dylan J. Foster, Akshay Krishnamurthy

We use surrogate losses to obtain several new regret bounds and new algorithms for contextual bandit learning.

Multi-Armed Bandits regression

Paper
Add Code

Logistic Regression: The Importance of Being Improper

no code implementations • 25 Mar 2018 • Dylan J. Foster, Satyen Kale, Haipeng Luo, Mehryar Mohri, Karthik Sridharan

Starting with the simple observation that the logistic loss is $1$-mixable, we design a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm.

regression

Paper
Add Code

Online Learning: Sufficient Statistics and the Burkholder Method

no code implementations • 20 Mar 2018 • Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan

We uncover a fairly general principle in online learning: If regret can be (approximately) expressed as a function of certain "sufficient statistics" for the data sequence, then there exists a special Burkholder function that 1) can be used algorithmically to achieve the regret bound and 2) only depends on these sufficient statistics, not the entire data sequence, so that the online strategy is only required to keep the sufficient statistics in memory.

Paper
Add Code

Practical Contextual Bandits with Regression Oracles

no code implementations • ICML 2018 • Dylan J. Foster, Alekh Agarwal, Miroslav Dudík, Haipeng Luo, Robert E. Schapire

A major challenge in contextual bandits is to design general-purpose algorithms that are both practically useful and theoretically well-founded.

General Classification Multi-Armed Bandits +1

Paper
Add Code

Parameter-free online learning via model selection

no code implementations • NeurIPS 2017 • Dylan J. Foster, Satyen Kale, Mehryar Mohri, Karthik Sridharan

We introduce an efficient algorithmic framework for model selection in online learning, also known as parameter-free online learning.

Model Selection

Paper
Add Code

Spectrally-normalized margin bounds for neural networks

1 code implementation • NeurIPS 2017 • Peter Bartlett, Dylan J. Foster, Matus Telgarsky

This paper presents a margin-based multiclass generalization bound for neural networks that scales with their margin-normalized "spectral complexity": their Lipschitz constant, meaning the product of the spectral norms of the weight matrices, times a certain correction factor.

Paper
Code

ZigZag: A new approach to adaptive online learning

no code implementations • 13 Apr 2017 • Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan

To develop a general theory of when this type of adaptive regret bound is achievable we establish a connection to the theory of decoupling inequalities for martingales in Banach spaces.

Paper
Add Code

Inference in Sparse Graphs with Pairwise Measurements and Side Information

no code implementations • 8 Mar 2017 • Dylan J. Foster, Daniel Reichman, Karthik Sridharan

For two-dimensional grids, our results improve over Globerson et al. (2015) by obtaining optimal recovery in the constant-height regime.

Learning Theory Tree Decomposition

Paper
Add Code

Learning in Games: Robustness of Fast Convergence

no code implementations • NeurIPS 2016 • Dylan J. Foster, Zhiyuan Li, Thodoris Lykouris, Karthik Sridharan, Eva Tardos

We show that learning algorithms satisfying a $\textit{low approximate regret}$ property experience fast convergence to approximate optimality in a large class of repeated games.

Paper
Add Code

Adaptive Online Learning

no code implementations • NeurIPS 2015 • Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan

We propose a general framework for studying adaptive regret bounds in the online learning framework, including model selection bounds and data-dependent bounds.

Model Selection

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.