Search Results for author: Zaifan Jiang

Found 5 papers, 0 papers with code

Preference as Reward, Maximum Preference Optimization with Importance Sampling

no code implementations27 Dec 2023 Zaifan Jiang, Xing Huang, Chao Wei

Reinforcement Learning from Human Feedback (RLHF) is a model-based algorithm to optimize preference learning, which first fits a reward model for preference scores and then optimizes the generating policy with an on-policy PPO algorithm to maximize the reward.

Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing

no code implementations2 Mar 2023 Shuai Xiao, Le Guo, Zaifan Jiang, Lei Lv, Yuanbo Chen, Jun Zhu, Shuang Yang

Furthermore, we show that the dual problem can be solved by policy learning, with the optimal dual variable being found efficiently via bisection search (i. e., by taking advantage of the monotonicity).

counterfactual Marketing

Multi-Scale User Behavior Network for Entire Space Multi-Task Learning

no code implementations3 Aug 2022 Jiarui Jin, Xianyu Chen, Weinan Zhang, Yuanbo Chen, Zaifan Jiang, Zekun Zhu, Zhewen Su, Yong Yu

Modelling the user's multiple behaviors is an essential part of modern e-commerce, whose widely adopted application is to jointly optimize click-through rate (CTR) and conversion rate (CVR) predictions.

Multi-Task Learning Survival Analysis

Who to Watch Next: Two-side Interactive Networks for Live Broadcast Recommendation

no code implementations9 Feb 2022 Jiarui Jin, Xianyu Chen, Yuanbo Chen, Weinan Zhang, Renting Rui, Zaifan Jiang, Zhewen Su, Yong Yu

With the prevalence of live broadcast business nowadays, a new type of recommendation service, called live broadcast recommendation, is widely used in many mobile e-commerce Apps.

Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.