no code implementations • 14 Jun 2023 • Zeyu Bian, Chengchun Shi, Zhengling Qi, Lan Wang
This work aims to study off-policy evaluation (OPE) under scenarios where two key reinforcement learning (RL) assumptions -- temporal stationarity and individual homogeneity are both violated.