no code implementations • 16 Jul 2022 • Fanglin Chen, Xiao Liu, Bo Tang, Feiyu Xiong, Serim Hwang, Guomian Zhuang
During deployment, we combine the offline RL model with the LP model to generate a robust policy under the budget constraints.
Offline RL reinforcement-learning +1