Search Results for author: Audrey Huang

Found 8 papers, 0 papers with code

Reinforcement Learning in Low-Rank MDPs with Density Features

no code implementations4 Feb 2023 Audrey Huang, Jinglin Chen, Nan Jiang

As a central technical challenge, the additive error of occupancy estimation is incompatible with the multiplicative definition of data coverage.

reinforcement-learning Reinforcement Learning (RL) +1

Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions

no code implementations27 Oct 2022 Audrey Huang, Nan Jiang

Off-policy evaluation often refers to two related tasks: estimating the expected return of a policy and estimating its value function (or other functions of interest, such as density ratios).

Off-policy evaluation

Off-Policy Risk Assessment in Markov Decision Processes

no code implementations21 Sep 2022 Audrey Huang, Liu Leqi, Zachary Chase Lipton, Kamyar Azizzadenesheli

To mitigate these problems, we incorporate model-based estimation to develop the first doubly robust (DR) estimator for the CDF of returns in MDPs.

Multi-Armed Bandits

Supervised Learning with General Risk Functionals

no code implementations27 Jun 2022 Liu Leqi, Audrey Huang, Zachary C. Lipton, Kamyar Azizzadenesheli

Standard uniform convergence results bound the generalization gap of the expected loss over a hypothesis class.

Offline Reinforcement Learning with Realizability and Single-policy Concentrability

no code implementations9 Feb 2022 Wenhao Zhan, Baihe Huang, Audrey Huang, Nan Jiang, Jason D. Lee

Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong assumptions on both the function classes (e. g., Bellman-completeness) and the data coverage (e. g., all-policy concentrability).

Offline RL reinforcement-learning +1

On the Convergence and Optimality of Policy Gradient for Markov Coherent Risk

no code implementations4 Mar 2021 Audrey Huang, Liu Leqi, Zachary C. Lipton, Kamyar Azizzadenesheli

Because optimizing the coherent risk is difficult in Markov decision processes, recent work tends to focus on the Markov coherent risk (MCR), a time-consistent surrogate.

Cannot find the paper you are looking for? You can Submit a new open access paper.