no code implementations • NeurIPS 2021 • Anish Agarwal, Abdullah Alomar, Varkey Alumootil, Devavrat Shah, Dennis Shen, Zhi Xu, Cindy Yang
We consider offline reinforcement learning (RL) with heterogeneous agents under severe data scarcity, i. e., we only observe a single historical trajectory for every agent under an unknown, potentially sub-optimal policy.