Search Results for author: Junwu Xiong

Found 10 papers, 2 papers with code

Hummer: Towards Limited Competitive Preference Dataset

no code implementations • 19 May 2024 • Li Jiang, Yusen Wu, Junwu Xiong, Jingqing Ruan, Yichuan Ding, Qingpei Guo, Zujie Wen, Jun Zhou, Xiaotie Deng

Preference datasets are essential for incorporating human preferences into pre-trained language models, playing a key role in the success of Reinforcement Learning from Human Feedback.

Paper
Add Code

Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems

no code implementations • 11 Jan 2024 • Tianyu Cui, Yanling Wang, Chuanpu Fu, Yong Xiao, Sijia Li, Xinhao Deng, Yunpeng Liu, Qinglin Zhang, Ziyi Qiu, Peiyang Li, Zhixing Tan, Junwu Xiong, Xinyu Kong, Zujie Wen, Ke Xu, Qi Li

Based on this, we propose a comprehensive taxonomy, which systematically analyzes potential risks associated with each module of an LLM system and discusses the corresponding mitigation strategies.

Language Modelling Large Language Model

Paper
Add Code

Unit Ball Model for Embedding Hierarchical Structures in the Complex Hyperbolic Space

1 code implementation • NeurIPS 2021 • Huiru Xiao, Caigao Jiang, Yangqiu Song, James Zhang, Junwu Xiong

Specifically, we propose to learn the embeddings of hierarchically structured data in the unit ball model of the complex hyperbolic space.

Representation Learning

Paper
Code

Model Embedding Model-Based Reinforcement Learning

no code implementations • 16 Jun 2020 • Xiaoyu Tan, Chao Qu, Junwu Xiong, James Zhang

Model-based reinforcement learning (MBRL) has shown its advantages in sample-efficiency over model-free reinforcement learning (MFRL).

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Variational Policy Propagation for Multi-agent Reinforcement Learning

no code implementations • 19 Apr 2020 • Chao Qu, Hui Li, Chang Liu, Junwu Xiong, James Zhang, Wei Chu, Weiqiang Wang, Yuan Qi, Le Song

We propose a \emph{collaborative} multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a \emph{joint} policy through the interactions over agents.

Multi-agent Reinforcement Learning reinforcement-learning +2

Paper
Add Code

S2VG: Soft Stochastic Value Gradient method

no code implementations • 25 Sep 2019 • Xiaoyu Tan, Chao Qu, Junwu Xiong, James Zhang

In this paper, we propose a simple and elegant model-based reinforcement learning algorithm called soft stochastic value gradient method (S2VG).

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Cost-Effective Incentive Allocation via Structured Counterfactual Inference

no code implementations • 7 Feb 2019 • Romain Lopez, Chenchen Li, Xiang Yan, Junwu Xiong, Michael. I. Jordan, Yuan Qi, Le Song

We address a practical problem ubiquitous in modern marketing campaigns, in which a central agent tries to learn a policy for allocating strategic financial incentives to customers and observes only bandit feedback.

counterfactual Counterfactual Inference +2

Paper
Add Code

Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning

no code implementations • NeurIPS 2019 • Chao Qu, Shie Mannor, Huan Xu, Yuan Qi, Le Song, Junwu Xiong

To the best of our knowledge, it is the first MARL algorithm with convergence guarantee in the control, off-policy and non-linear function approximation setting.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Reinforcement Learning for Uplift Modeling

1 code implementation • 26 Nov 2018 • Chenchen Li, Xiang Yan, Xiaotie Deng, Yuan Qi, Wei Chu, Le Song, Junlong Qiao, Jianshan He, Junwu Xiong

Uplift modeling aims to directly model the incremental impact of a treatment on an individual response.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Latent Dirichlet Allocation for Internet Price War

no code implementations • 23 Aug 2018 • Chenchen Li, Xiang Yan, Xiaotie Deng, Yuan Qi, Wei Chu, Le Song, Junlong Qiao, Jianshan He, Junwu Xiong

Then we develop a variant of Latent Dirichlet Allocation (LDA) to infer latent variables under the current market environment, which represents the preferences of customers and strategies of competitors.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.