no code implementations • 4 Feb 2023 • Audrey Huang, Jinglin Chen, Nan Jiang
As a central technical challenge, the additive error of occupancy estimation is incompatible with the multiplicative definition of data coverage.
no code implementations • 27 Oct 2022 • Audrey Huang, Nan Jiang
Off-policy evaluation often refers to two related tasks: estimating the expected return of a policy and estimating its value function (or other functions of interest, such as density ratios).
no code implementations • 21 Sep 2022 • Audrey Huang, Liu Leqi, Zachary Chase Lipton, Kamyar Azizzadenesheli
To mitigate these problems, we incorporate model-based estimation to develop the first doubly robust (DR) estimator for the CDF of returns in MDPs.
no code implementations • 27 Jun 2022 • Liu Leqi, Audrey Huang, Zachary C. Lipton, Kamyar Azizzadenesheli
Standard uniform convergence results bound the generalization gap of the expected loss over a hypothesis class.
no code implementations • 9 Feb 2022 • Wenhao Zhan, Baihe Huang, Audrey Huang, Nan Jiang, Jason D. Lee
Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong assumptions on both the function classes (e. g., Bellman-completeness) and the data coverage (e. g., all-policy concentrability).
no code implementations • NeurIPS 2021 • Audrey Huang, Liu Leqi, Zachary C. Lipton, Kamyar Azizzadenesheli
Even when unable to run experiments, practitioners can evaluate prospective policies, using previously logged data.
no code implementations • 4 Mar 2021 • Audrey Huang, Liu Leqi, Zachary C. Lipton, Kamyar Azizzadenesheli
Because optimizing the coherent risk is difficult in Markov decision processes, recent work tends to focus on the Markov coherent risk (MCR), a time-consistent surrogate.
no code implementations • 11 Jul 2019 • Maximilian Sieb, Zhou Xian, Audrey Huang, Oliver Kroemer, Katerina Fragkiadaki
We cast visual imitation as a visual correspondence problem.