Policy Gradient for items Recommendation on Virtual Taobao

CUHK Course IERG5350 2020 · Wei Fang, Fanyuan Zeng ·

Recent years have witnessed digital content appear with plenty of forms (including online courses, online shopping and e-news) in daily life of people, which has provided with opportunities as well as challenges for systems to provide users with personalized services and information. The goal of our project is to design a recommender algorithm that can return a good list such that the consumers might have high chance of clicking the items on a simulated environment named Virtual Taobao, a simulator trained from the real-data from Taobao. Firstly, We tried some state-of-art deep-reinforcement algorithms, such as deep deterministic policy gradient (DDPG) method and Twin Delayed DDPG (TD3), what's more, we also used the Proximal Policy Optimisation (PPO) algorithm and tried to improve the PPO algorithm with the features of the attributes of the consumers. Video Link: https://drive.google.com/file/d/1WVoAjKcJ-4t5o6n_U5KaoDn7BymksYqr/view?usp=sharing

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

Adam • Batch Normalization • Convolution • DDPG • Dense Connections • Entropy Regularization • Experience Replay • PPO • ReLU • Weight Decay

Edit Social Preview

Policy Gradient for items Recommendation on Virtual Taobao

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove