no code implementations • ICLR 2020 • Xinyun Chen, Lu Wang, Yizhe Hang, Heng Ge, Hongyuan Zha
We consider off-policy policy evaluation when the trajectory data are generated by multiple behavior policies.