Search Results for author: Deheng Ye

Found 36 papers, 16 papers with code

Affordable Generative Agents

1 code implementation3 Feb 2024 Yangbin Yu, Qin Zhang, Junyou Li, Qiang Fu, Deheng Ye

The emergence of large language models (LLMs) has significantly advanced the simulation of believable interactive agents.

More Agents Is All You Need

no code implementations3 Feb 2024 Junyou Li, Qin Zhang, Yangbin Yu, Qiang Fu, Deheng Ye

We find that, simply via a sampling-and-voting method, the performance of large language models (LLMs) scales with the number of agents instantiated.

HGAttack: Transferable Heterogeneous Graph Adversarial Attack

no code implementations18 Jan 2024 He Zhao, Zhiwei Zeng, Yongwei Wang, Deheng Ye, Chunyan Miao

Heterogeneous Graph Neural Networks (HGNNs) are increasingly recognized for their performance in areas like the web and e-commerce, where resilience against adversarial attacks is crucial.

Adversarial Attack

Replay-enhanced Continual Reinforcement Learning

no code implementations20 Nov 2023 Tiantian Zhang, Kevin Zehua Shen, Zichuan Lin, Bo Yuan, Xueqian Wang, Xiu Li, Deheng Ye

On the other hand, offline learning on replayed tasks while learning a new task may induce a distributional shift between the dataset and the learned policy on old tasks, resulting in forgetting.

Continual Learning reinforcement-learning

LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay

1 code implementation23 Oct 2023 Yihuai Lan, Zhiqiang Hu, Lei Wang, Yang Wang, Deheng Ye, Peilin Zhao, Ee-Peng Lim, Hui Xiong, Hao Wang

To achieve this goal, we adopt Avalon, a representative communication game, as the environment and use system prompts to guide LLM agents to play the game.

Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity Constraints

1 code implementation24 Aug 2023 Hanchi Huang, Li Shen, Deheng Ye, Wei Liu

We propose a novel master-slave architecture to solve the top-$K$ combinatorial multi-armed bandits problem with non-linear bandit feedback and diversity constraints, which, to the best of our knowledge, is the first combinatorial bandits setting considering diversity constraints under bandit feedback.

Multi-Armed Bandits

RLTF: Reinforcement Learning from Unit Test Feedback

1 code implementation10 Jul 2023 Jiate Liu, Yiqin Zhu, Kaiwen Xiao, Qiang Fu, Xiao Han, Wei Yang, Deheng Ye

The goal of program synthesis, or code generation, is to generate executable code based on given descriptions.

Code Generation Program Synthesis +2

Future-conditioned Unsupervised Pretraining for Decision Transformer

1 code implementation26 May 2023 Zhihui Xie, Zichuan Lin, Deheng Ye, Qiang Fu, Wei Yang, Shuai Li

While promising, return conditioning is limited to training data labeled with rewards and therefore faces challenges in learning from unsupervised data.

Decision Making Reinforcement Learning (RL)

Deploying Offline Reinforcement Learning with Human Feedback

no code implementations13 Mar 2023 Ziniu Li, Ke Xu, Liu Liu, Lanqing Li, Deheng Ye, Peilin Zhao

To address this issue, we propose an alternative framework that involves a human supervising the RL models and providing additional feedback in the online deployment phase.

Decision Making Model Selection +3

Sample Dropout: A Simple yet Effective Variance Reduction Technique in Deep Policy Optimization

1 code implementation5 Feb 2023 Zichuan Lin, Xiapeng Wu, Mingfei Sun, Deheng Ye, Qiang Fu, Wei Yang, Wei Liu

Recent success in Deep Reinforcement Learning (DRL) methods has shown that policy optimization with respect to an off-policy distribution via importance sampling is effective for sample reuse.

Revisiting Estimation Bias in Policy Gradients for Deep Reinforcement Learning

no code implementations20 Jan 2023 Haoxuan Pan, Deheng Ye, Xiaoming Duan, Qiang Fu, Wei Yang, Jianping He, Mingfei Sun

We show that, despite such state distribution shift, the policy gradient estimation bias can be reduced in the following three ways: 1) a small learning rate; 2) an adaptive-learning-rate-based optimizer; and 3) KL regularization.

Continuous Control reinforcement-learning +1

A Survey on Transformers in Reinforcement Learning

no code implementations8 Jan 2023 Wenzhe Li, Hao Luo, Zichuan Lin, Chongjie Zhang, Zongqing Lu, Deheng Ye

Transformer has been considered the dominating neural architecture in NLP and CV, mostly under supervised settings.

reinforcement-learning Reinforcement Learning (RL)

RLogist: Fast Observation Strategy on Whole-slide Images with Deep Reinforcement Learning

1 code implementation4 Dec 2022 Boxuan Zhao, Jun Zhang, Deheng Ye, Jian Cao, Xiao Han, Qiang Fu, Wei Yang

Most of the existing methods rely on a multiple instance learning framework that requires densely sampling local patches at high magnification.

Benchmarking Decision Making +4

Pretraining in Deep Reinforcement Learning: A Survey

no code implementations8 Nov 2022 Zhihui Xie, Zichuan Lin, Junyou Li, Shuai Li, Deheng Ye

The past few years have seen rapid progress in combining reinforcement learning (RL) with deep learning.

reinforcement-learning Reinforcement Learning (RL)

Curriculum-based Asymmetric Multi-task Reinforcement Learning

1 code implementation7 Nov 2022 Hanchi Huang, Deheng Ye, Li Shen, Wei Liu

To mitigate the negative influence of customizing the one-off training order in curriculum-based AMTL, CAMRL switches its training mode between parallel single-task RL and asymmetric multi-task RL (MTRL), according to an indicator regarding the training time, the overall performance, and the performance gap among tasks.

Multi-Task Learning reinforcement-learning +1

Robust Offline Reinforcement Learning with Gradient Penalty and Constraint Relaxation

1 code implementation19 Oct 2022 Chengqian Gao, Ke Xu, Liu Liu, Deheng Ye, Peilin Zhao, Zhiqiang Xu

A promising paradigm for offline reinforcement learning (RL) is to constrain the learned policy to stay close to the dataset behaviors, known as policy constraint offline RL.

D4RL Offline RL +2

More Centralized Training, Still Decentralized Execution: Multi-Agent Conditional Policy Factorization

1 code implementation26 Sep 2022 Jiangxing Wang, Deheng Ye, Zongqing Lu

To this end, we propose multi-agent conditional policy factorization (MACPF), which takes more centralized training but still enables decentralized execution.

Multi-agent Reinforcement Learning

Revisiting Discrete Soft Actor-Critic

1 code implementation21 Sep 2022 Haibin Zhou, Zichuan Lin, Junyou Li, Qiang Fu, Wei Yang, Deheng Ye

We study the adaption of soft actor-critic (SAC) from continuous action space to discrete action space.

Atari Games Q-Learning

Dynamics-Adaptive Continual Reinforcement Learning via Progressive Contextualization

no code implementations1 Sep 2022 Tiantian Zhang, Zichuan Lin, Yuxing Wang, Deheng Ye, Qiang Fu, Wei Yang, Xueqian Wang, Bin Liang, Bo Yuan, Xiu Li

A key challenge of continual reinforcement learning (CRL) in dynamic environments is to promptly adapt the RL agent's behavior as the environment changes over its lifetime, while minimizing the catastrophic forgetting of the learned information.

Bayesian Inference Knowledge Distillation +3

Quantized Adaptive Subgradient Algorithms and Their Applications

no code implementations11 Aug 2022 Ke Xu, Jianqiao Wangni, Yifan Zhang, Deheng Ye, Jiaxiang Wu, Peilin Zhao

Therefore, a threshold quantization strategy with a relatively small error is adopted in QCMD adagrad and QRDA adagrad to improve the signal-to-noise ratio and preserve the sparsity of the model.

Quantization

GPN: A Joint Structural Learning Framework for Graph Neural Networks

no code implementations12 May 2022 Qianggang Ding, Deheng Ye, Tingyang Xu, Peilin Zhao

To the best of our knowledge, our method is the first GNN-based bilevel optimization framework for resolving this task.

Bilevel Optimization

MineRL Diamond 2021 Competition: Overview, Results, and Lessons Learned

no code implementations17 Feb 2022 Anssi Kanervisto, Stephanie Milani, Karolis Ramanauskas, Nicholay Topin, Zichuan Lin, Junyou Li, Jianing Shi, Deheng Ye, Qiang Fu, Wei Yang, Weijun Hong, Zhongyue Huang, Haicheng Chen, Guangjun Zeng, Yue Lin, Vincent Micheli, Eloi Alonso, François Fleuret, Alexander Nikulin, Yury Belousov, Oleg Svidchenko, Aleksei Shpilman

With this in mind, we hosted the third edition of the MineRL ObtainDiamond competition, MineRL Diamond 2021, with a separate track in which we permitted any solution to promote the participation of newcomers.

JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning

no code implementations7 Dec 2021 Zichuan Lin, Junyou Li, Jianing Shi, Deheng Ye, Qiang Fu, Wei Yang

To address this, we propose JueWu-MC, a sample-efficient hierarchical RL approach equipped with representation learning and imitation learning to deal with perception and exploration.

Efficient Exploration Hierarchical Reinforcement Learning +4

Coordinated Proximal Policy Optimization

1 code implementation NeurIPS 2021 Zifan Wu, Chao Yu, Deheng Ye, Junge Zhang, Haiyin Piao, Hankz Hankui Zhuo

We present Coordinated Proximal Policy Optimization (CoPPO), an algorithm that extends the original Proximal Policy Optimization (PPO) to the multi-agent setting.

Starcraft Starcraft II

Learning Diverse Policies in MOBA Games via Macro-Goals

no code implementations NeurIPS 2021 Yiming Gao, Bei Shi, Xueying Du, Liang Wang, Guangwei Chen, Zhenjie Lian, Fuhao Qiu, Guoan Han, Weixuan Wang, Deheng Ye, Qiang Fu, Wei Yang, Lanxiao Huang

Recently, many researchers have made successful progress in building the AI systems for MOBA-game-playing with deep reinforcement learning, such as on Dota 2 and Honor of Kings.

Dota 2

TiKick: Towards Playing Multi-agent Football Full Games from Single-agent Demonstrations

1 code implementation9 Oct 2021 Shiyu Huang, Wenze Chen, Longfei Zhang, Shizhen Xu, Ziyang Li, Fengming Zhu, Deheng Ye, Ting Chen, Jun Zhu

To the best of our knowledge, Tikick is the first learning-based AI system that can take over the multi-agent Google Research Football full game, while previous work could either control a single agent or experiment on toy academic scenarios.

Starcraft Starcraft II

Boosting Offline Reinforcement Learning with Residual Generative Modeling

no code implementations19 Jun 2021 Hua Wei, Deheng Ye, Zhao Liu, Hao Wu, Bo Yuan, Qiang Fu, Wei Yang, Zhenhui Li

While most research focuses on the state-action function part through reducing the bootstrapping error in value function approximation induced by the distribution shift of training data, the effects of error propagation in generative modeling have been neglected.

Offline RL Q-Learning +2

MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks

1 code implementation13 May 2021 Menghui Zhu, Minghuan Liu, Jian Shen, Zhicheng Zhang, Sheng Chen, Weinan Zhang, Deheng Ye, Yong Yu, Qiang Fu, Wei Yang

In Goal-oriented Reinforcement learning, relabeling the raw goals in past experience to provide agents with hindsight ability is a major solution to the reward sparsity problem.

Generating Informative CVE Description From ExploitDB Posts by Extractive Summarization

no code implementations5 Jan 2021 Jiamou Sun, Zhenchang Xing, Hao Guo, Deheng Ye, Xiaohong Li, Xiwei Xu, Liming Zhu

The extracted aspects from an ExploitDB post are then composed into a CVE description according to the suggested CVE description templates, which is must-provided information for requesting new CVEs.

Extractive Summarization Text Summarization

Which Heroes to Pick? Learning to Draft in MOBA Games with Neural Networks and Tree Search

no code implementations18 Dec 2020 Sheng Chen, Menghui Zhu, Deheng Ye, Weinan Zhang, Qiang Fu, Wei Yang

Hero drafting is essential in MOBA game playing as it builds the team of each side and directly affects the match outcome.

Towards Playing Full MOBA Games with Deep Reinforcement Learning

no code implementations NeurIPS 2020 Deheng Ye, Guibin Chen, Wen Zhang, Sheng Chen, Bo Yuan, Bo Liu, Jia Chen, Zhao Liu, Fuhao Qiu, Hongsheng Yu, Yinyuting Yin, Bei Shi, Liang Wang, Tengfei Shi, Qiang Fu, Wei Yang, Lanxiao Huang, Wei Liu

However, existing work falls short in handling the raw game complexity caused by the explosion of agent combinations, i. e., lineups, when expanding the hero pool in case that OpenAI's Dota AI limits the play to a pool of only 17 heroes.

Dota 2 reinforcement-learning +1

Supervised Learning Achieves Human-Level Performance in MOBA Games: A Case Study of Honor of Kings

no code implementations25 Nov 2020 Deheng Ye, Guibin Chen, Peilin Zhao, Fuhao Qiu, Bo Yuan, Wen Zhang, Sheng Chen, Mingfei Sun, Xiaoqian Li, Siqin Li, Jing Liang, Zhenjie Lian, Bei Shi, Liang Wang, Tengfei Shi, Qiang Fu, Wei Yang, Lanxiao Huang

Unlike prior attempts, we integrate the macro-strategy and the micromanagement of MOBA-game-playing into neural networks in a supervised and end-to-end manner.

Relation-Aware Transformer for Portfolio Policy Learning

2 code implementations IJCAI 2020 Ke Xu, Yifan Zhang, Deheng Ye, Peilin Zhao, Mingkui Tan

One of the key issues is how to represent the non-stationary price series of assets in a portfolio, which is important for portfolio decisions.

Relation

Cannot find the paper you are looking for? You can Submit a new open access paper.