Policy Gradient for items Recommendation on Virtual Taobao

CUHK Course IERG5350 2020  ·  Wei Fang, Fanyuan Zeng ·

Recent years have witnessed digital content appear with plenty of forms (including online courses, online shopping and e-news) in daily life of people, which has provided with opportunities as well as challenges for systems to provide users with personalized services and information. The goal of our project is to design a recommender algorithm that can return a good list such that the consumers might have high chance of clicking the items on a simulated environment named Virtual Taobao, a simulator trained from the real-data from Taobao. Firstly, We tried some state-of-art deep-reinforcement algorithms, such as deep deterministic policy gradient (DDPG) method and Twin Delayed DDPG (TD3), what's more, we also used the Proximal Policy Optimisation (PPO) algorithm and tried to improve the PPO algorithm with the features of the attributes of the consumers. Video Link: https://drive.google.com/file/d/1WVoAjKcJ-4t5o6n_U5KaoDn7BymksYqr/view?usp=sharing

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods