1 code implementation • 18 Apr 2024 • Yongcheng Zeng, Guoqing Liu, Weiyu Ma, Ning Yang, Haifeng Zhang, Jun Wang
Fine-tuning pre-trained Large Language Models (LLMs) is essential to align them with human values and intentions.
no code implementations • 14 Mar 2024 • Qirui Mi, Zhiyu Zhao, Siyu Xia, Yan Song, Jun Wang, Haifeng Zhang
Effective macroeconomic policies play a crucial role in promoting economic growth and social stability.
1 code implementation • 19 Dec 2023 • Weiyu Ma, Qirui Mi, Xue Yan, Yuqiao Wu, Runji Lin, Haifeng Zhang, Jun Wang
StarCraft II is a challenging benchmark for AI agents due to the necessity of both precise micro level operations and strategic macro awareness.
no code implementations • 18 Dec 2023 • Chengyuan Zhu, Yiyuan Yang, Kaixiang Yang, Haifeng Zhang, Qinmin Yang, C. L. Philip Chen
This refinement is crucial in effectively identifying genuine threats to pipelines, thus enhancing the safety of energy transportation.
no code implementations • 27 Oct 2023 • Xue Yan, Yan Song, Xinyu Cui, Filippos Christianos, Haifeng Zhang, David Henry Mguni, Jun Wang
To that purpose, we offer a new leader-follower bilevel framework that is capable of learning to ask relevant questions (prompts) and subsequently undertaking reasoning to guide the learning of actions.
no code implementations • 24 Jun 2023 • Muning Wen, Runji Lin, Hanjing Wang, Yaodong Yang, Ying Wen, Luo Mai, Jun Wang, Haifeng Zhang, Weinan Zhang
Transformer architectures have facilitated the development of large-scale and general-purpose sequence models for prediction tasks in natural language processing and computer vision, e. g., GPT-3 and Swin Transformer.
1 code implementation • 16 May 2023 • Yan Song, He Jiang, Zheng Tian, Haifeng Zhang, Yingping Zhang, Jiangcheng Zhu, Zonghong Dai, Weinan Zhang, Jun Wang
Few multi-agent reinforcement learning (MARL) research on Google Research Football (GRF) focus on the 11v11 multi-agent full-game scenario and to the best of our knowledge, no open benchmark on this scenario has been released to the public.
no code implementations • 15 Nov 2022 • Runji Lin, Ye Li, Xidong Feng, Zhaowei Zhang, Xian Hong Wu Fung, Haifeng Zhang, Jun Wang, Yali Du, Yaodong Yang
Firstly, we propose prompt tuning for offline RL, where a context vector sequence is concatenated with the input to guide the conditional policy generation.
1 code implementation • 12 Jan 2022 • Xue Yan, Yali Du, Binxin Ru, Jun Wang, Haifeng Zhang, Xu Chen
The Elo rating system is widely adopted to evaluate the skills of (chess) game and sports players.
1 code implementation • 31 Dec 2021 • Xidong Feng, Bo Liu, Jie Ren, Luo Mai, Rui Zhu, Haifeng Zhang, Jun Wang, Yaodong Yang
Gradient-based Meta-RL (GMRL) refers to methods that maintain two-level optimisation procedures wherein the outer-loop meta-learner guides the inner-loop gradient-based reinforcement learner to achieve fast adaptations.
1 code implementation • 6 Dec 2021 • Linghui Meng, Muning Wen, Yaodong Yang, Chenyang Le, Xiyun Li, Weinan Zhang, Ying Wen, Haifeng Zhang, Jun Wang, Bo Xu
In this paper, we facilitate the research by providing large-scale datasets, and use them to examine the usage of the Decision Transformer in the context of MARL.
no code implementations • 28 Oct 2021 • Chenguang Wang, Yaodong Yang, Oliver Slumbers, Congying Han, Tiande Guo, Haifeng Zhang, Jun Wang
In this paper, we introduce a two-player zero-sum framework between a trainable \emph{Solver} and a \emph{Data Generator} to improve the generalization ability of deep learning-based solvers for Traveling Salesman Problem (TSP).
no code implementations • 29 Sep 2021 • Linghui Meng, Muning Wen, Yaodong Yang, Chenyang Le, Xi yun Li, Haifeng Zhang, Ying Wen, Weinan Zhang, Jun Wang, Bo Xu
Offline reinforcement learning leverages static datasets to learn optimal policies with no necessity to access the environment.
Multi-agent Reinforcement Learning reinforcement-learning +2
1 code implementation • NeurIPS 2021 • Jakub Grudzien Kuba, Muning Wen, Yaodong Yang, Linghui Meng, Shangding Gu, Haifeng Zhang, David Henry Mguni, Jun Wang
In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG (MAPG) methods degrades as the variance of gradient estimates increases rapidly with the number of agents.
no code implementations • 1 Jan 2021 • Yali Du, Yifan Zhao, Meng Fang, Jun Wang, Gangyan Xu, Haifeng Zhang
Dealing with multi-agent control in networked systems is one of the biggest challenges in Reinforcement Learning (RL) and limited success has been presented compared to recent deep reinforcement learning in single-agent domain.
1 code implementation • 9 Dec 2020 • Yunfei Liu, Yang Yang, Xianyu Chen, Jian Shen, Haifeng Zhang, Yong Yu
Knowledge tracing (KT) defines the task of predicting whether students can correctly answer questions based on their historical response.
Ranked #3 on Knowledge Tracing on EdNet
no code implementations • 10 Sep 2019 • Liheng Chen, Hongyi Guo, Yali Du, Fei Fang, Haifeng Zhang, Yaoming Zhu, Ming Zhou, Wei-Nan Zhang, Qing Wang, Yong Yu
Although existing works formulate this problem into a centralized learning with decentralized execution framework, which avoids the non-stationary problem in training, their decentralized execution paradigm limits the agents' capability to coordinate.
Multi-agent Reinforcement Learning reinforcement-learning +1
1 code implementation • 8 Sep 2019 • Haifeng Zhang, Weizhe Chen, Zeren Huang, Minne Li, Yaodong Yang, Wei-Nan Zhang, Jun Wang
Coordination is one of the essential problems in multi-agent systems.
Multiagent Systems
no code implementations • 14 Nov 2018 • Haifeng Zhang, Zilong Guo, Han Cai, Chris Wang, Wei-Nan Zhang, Yong Yu, Wenxin Li, Jun Wang
With the rapid growth of the express industry, intelligent warehouses that employ autonomous robots for carrying parcels have been widely used to handle the vast express volume.
no code implementations • 4 Jan 2018 • Yi Zhang, Houjun Huang, Haifeng Zhang, Liao Ni, Wei Xu, Nasir Uddin Ahmed, Md. Shakil Ahmed, Yilun Jin, Yingjie Chen, Jingxuan Wen, Wenxin Li
The development of finger vein recognition algorithms heavily depends on large-scale real-world data sets.
no code implementations • 5 Jul 2017 • Haifeng Zhang, Jun Wang, Zhiming Zhou, Wei-Nan Zhang, Ying Wen, Yong Yu, Wenxin Li
In typical reinforcement learning (RL), the environment is assumed given and the goal of the learning is to identify an optimal policy for the agent taking actions through its interactions with the environment.
no code implementations • 30 Aug 2016 • Haifeng Zhang, Yevgeniy Vorobeychik
Innovation diffusion has been studied extensively in a variety of disciplines, including sociology, economics, marketing, ecology, and computer science.