no code implementations • 23 Feb 2024 • Haoming Li, Yusen Huo, Shuai Dou, Zhenzhe Zheng, Zhilin Zhang, Chuan Yu, Jian Xu, Fan Wu
The trained policy can subsequently be deployed for further data collection, resulting in an iterative training framework, which we refer to as iterative offline RL.
1 code implementation • 13 Oct 2022 • Zhiyu Mou, Yusen Huo, Rongquan Bai, Mingzhou Xie, Chuan Yu, Jian Xu, Bo Zheng
Due to safety concerns, it was believed that the RL training process can only be carried out in an offline virtual advertising system (VAS) that is built based on the historical data generated in the RAS.
no code implementations • 30 Sep 2019 • Yusen Huo, Qinghua Tao, Jianming Hu
In the proposed model, a multi-task learning structure is used to get the cooperative policy by learning.