Search Results for author: Zaifan Jiang

Found 5 papers, 0 papers with code

Preference as Reward, Maximum Preference Optimization with Importance Sampling

no code implementations • 27 Dec 2023 • Zaifan Jiang, Xing Huang, Chao Wei

Reinforcement Learning from Human Feedback (RLHF) is a model-based algorithm to optimize preference learning, which first fits a reward model for preference scores and then optimizes the generating policy with an on-policy PPO algorithm to maximize the reward.

Paper
Add Code

Tile Networks: Learning Optimal Geometric Layout for Whole-page Recommendation

no code implementations • 3 Mar 2023 • Shuai Xiao, Zaifan Jiang, Shuang Yang

Finding optimal configurations in a geometric space is a key challenge in many technological disciplines.

Learning-To-Rank reinforcement-learning +1

Paper
Add Code

Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing

no code implementations • 2 Mar 2023 • Shuai Xiao, Le Guo, Zaifan Jiang, Lei Lv, Yuanbo Chen, Jun Zhu, Shuang Yang

Furthermore, we show that the dual problem can be solved by policy learning, with the optimal dual variable being found efficiently via bisection search (i. e., by taking advantage of the monotonicity).

counterfactual Marketing

Paper
Add Code

Multi-Scale User Behavior Network for Entire Space Multi-Task Learning

no code implementations • 3 Aug 2022 • Jiarui Jin, Xianyu Chen, Weinan Zhang, Yuanbo Chen, Zaifan Jiang, Zekun Zhu, Zhewen Su, Yong Yu

Modelling the user's multiple behaviors is an essential part of modern e-commerce, whose widely adopted application is to jointly optimize click-through rate (CTR) and conversion rate (CVR) predictions.

Multi-Task Learning Survival Analysis

Paper
Add Code

Who to Watch Next: Two-side Interactive Networks for Live Broadcast Recommendation

no code implementations • 9 Feb 2022 • Jiarui Jin, Xianyu Chen, Yuanbo Chen, Weinan Zhang, Renting Rui, Zaifan Jiang, Zhewen Su, Yong Yu

With the prevalence of live broadcast business nowadays, a new type of recommendation service, called live broadcast recommendation, is widely used in many mobile e-commerce Apps.

Retrieval

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.