no code implementations • 19 May 2024 • Li Jiang, Yusen Wu, Junwu Xiong, Jingqing Ruan, Yichuan Ding, Qingpei Guo, Zujie Wen, Jun Zhou, Xiaotie Deng
Preference datasets are essential for incorporating human preferences into pre-trained language models, playing a key role in the success of Reinforcement Learning from Human Feedback.
no code implementations • 11 Jan 2024 • Tianyu Cui, Yanling Wang, Chuanpu Fu, Yong Xiao, Sijia Li, Xinhao Deng, Yunpeng Liu, Qinglin Zhang, Ziyi Qiu, Peiyang Li, Zhixing Tan, Junwu Xiong, Xinyu Kong, Zujie Wen, Ke Xu, Qi Li
Based on this, we propose a comprehensive taxonomy, which systematically analyzes potential risks associated with each module of an LLM system and discusses the corresponding mitigation strategies.
1 code implementation • NeurIPS 2021 • Huiru Xiao, Caigao Jiang, Yangqiu Song, James Zhang, Junwu Xiong
Specifically, we propose to learn the embeddings of hierarchically structured data in the unit ball model of the complex hyperbolic space.
no code implementations • 16 Jun 2020 • Xiaoyu Tan, Chao Qu, Junwu Xiong, James Zhang
Model-based reinforcement learning (MBRL) has shown its advantages in sample-efficiency over model-free reinforcement learning (MFRL).
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 19 Apr 2020 • Chao Qu, Hui Li, Chang Liu, Junwu Xiong, James Zhang, Wei Chu, Weiqiang Wang, Yuan Qi, Le Song
We propose a \emph{collaborative} multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a \emph{joint} policy through the interactions over agents.
Multi-agent Reinforcement Learning reinforcement-learning +2
no code implementations • 25 Sep 2019 • Xiaoyu Tan, Chao Qu, Junwu Xiong, James Zhang
In this paper, we propose a simple and elegant model-based reinforcement learning algorithm called soft stochastic value gradient method (S2VG).
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 7 Feb 2019 • Romain Lopez, Chenchen Li, Xiang Yan, Junwu Xiong, Michael. I. Jordan, Yuan Qi, Le Song
We address a practical problem ubiquitous in modern marketing campaigns, in which a central agent tries to learn a policy for allocating strategic financial incentives to customers and observes only bandit feedback.
no code implementations • NeurIPS 2019 • Chao Qu, Shie Mannor, Huan Xu, Yuan Qi, Le Song, Junwu Xiong
To the best of our knowledge, it is the first MARL algorithm with convergence guarantee in the control, off-policy and non-linear function approximation setting.
Multi-agent Reinforcement Learning reinforcement-learning +1
1 code implementation • 26 Nov 2018 • Chenchen Li, Xiang Yan, Xiaotie Deng, Yuan Qi, Wei Chu, Le Song, Junlong Qiao, Jianshan He, Junwu Xiong
Uplift modeling aims to directly model the incremental impact of a treatment on an individual response.
no code implementations • 23 Aug 2018 • Chenchen Li, Xiang Yan, Xiaotie Deng, Yuan Qi, Wei Chu, Le Song, Junlong Qiao, Jianshan He, Junwu Xiong
Then we develop a variant of Latent Dirichlet Allocation (LDA) to infer latent variables under the current market environment, which represents the preferences of customers and strategies of competitors.