Search Results for author: Audrey Huang

Found 8 papers, 0 papers with code

Reinforcement Learning in Low-Rank MDPs with Density Features

no code implementations • 4 Feb 2023 • Audrey Huang, Jinglin Chen, Nan Jiang

As a central technical challenge, the additive error of occupancy estimation is incompatible with the multiplicative definition of data coverage.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions

no code implementations • 27 Oct 2022 • Audrey Huang, Nan Jiang

Off-policy evaluation often refers to two related tasks: estimating the expected return of a policy and estimating its value function (or other functions of interest, such as density ratios).

Off-policy evaluation

Paper
Add Code

Off-Policy Risk Assessment in Markov Decision Processes

no code implementations • 21 Sep 2022 • Audrey Huang, Liu Leqi, Zachary Chase Lipton, Kamyar Azizzadenesheli

To mitigate these problems, we incorporate model-based estimation to develop the first doubly robust (DR) estimator for the CDF of returns in MDPs.

Multi-Armed Bandits

Paper
Add Code

Supervised Learning with General Risk Functionals

no code implementations • 27 Jun 2022 • Liu Leqi, Audrey Huang, Zachary C. Lipton, Kamyar Azizzadenesheli

Standard uniform convergence results bound the generalization gap of the expected loss over a hypothesis class.

Paper
Add Code

Offline Reinforcement Learning with Realizability and Single-policy Concentrability

no code implementations • 9 Feb 2022 • Wenhao Zhan, Baihe Huang, Audrey Huang, Nan Jiang, Jason D. Lee

Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong assumptions on both the function classes (e. g., Bellman-completeness) and the data coverage (e. g., all-policy concentrability).

Offline RL reinforcement-learning +1

Paper
Add Code

Off-Policy Risk Assessment in Contextual Bandits

no code implementations • NeurIPS 2021 • Audrey Huang, Liu Leqi, Zachary C. Lipton, Kamyar Azizzadenesheli

Even when unable to run experiments, practitioners can evaluate prospective policies, using previously logged data.

Multi-Armed Bandits Off-policy evaluation

Paper
Add Code

On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk

no code implementations • 4 Mar 2021 • Audrey Huang, Liu Leqi, Zachary C. Lipton, Kamyar Azizzadenesheli

Because optimizing the coherent risk is difficult in Markov decision processes, recent work tends to focus on the Markov coherent risk (MCR), a time-consistent surrogate.

Paper
Add Code

Graph-Structured Visual Imitation

no code implementations • 11 Jul 2019 • Maximilian Sieb, Zhou Xian, Audrey Huang, Oliver Kroemer, Katerina Fragkiadaki

We cast visual imitation as a visual correspondence problem.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.