no code implementations • 26 Jul 2023 • Xumei Xi, Yuke Zhao, Quan Liu, Liwen Ouyang, Yang Wu
To this end, we train a farsighted recommender by using an offline RL algorithm with the policy network in our model architecture that has been initialized from a pre-trained transformer model.