1 code implementation • 10 Mar 2024 • Ruiwen Zhou, Yingxuan Yang, Muning Wen, Ying Wen, Wenhao Wang, Chunling Xi, Guoqiang Xu, Yong Yu, Weinan Zhang
Among these works, many of them utilize in-context examples to achieve generalization without the need for fine-tuning, while few of them have considered the problem of how to select and effectively utilize these examples.
no code implementations • 5 Mar 2024 • Xinbing Wang, Luoyi Fu, Xiaoying Gan, Ying Wen, Guanjie Zheng, Jiaxin Ding, Liyao Xiang, Nanyang Ye, Meng Jin, Shiyu Liang, Bin Lu, Haiwen Wang, Yi Xu, Cheng Deng, Shao Zhang, Huquan Kang, Xingli Wang, Qi Li, Zhixin Guo, Jiexing Qi, Pan Liu, Yuyang Ren, Lyuwen Wu, Jungang Yang, Jianping Zhou, Chenghu Zhou
The exponential growth of scientific literature requires effective management and extraction of valuable insights.
no code implementations • 29 Feb 2024 • Jingxiao Chen, Weiji Xie, Weinan Zhang, Yong Yu, Ying Wen
Firstly, unaware of the game structure, it is impossible to interact with the opponents and conduct a major learning paradigm, self-play, for competitive games.
1 code implementation • 27 Feb 2024 • Siyuan Guo, Cheng Deng, Ying Wen, Hechang Chen, Yi Chang, Jun Wang
In the development stage, DS-Agent follows the CBR framework to structure an automatic iteration pipeline, which can flexibly capitalize on the expert knowledge from Kaggle, and facilitate consistent performance improvement through the feedback mechanism.
no code implementations • 19 Feb 2024 • Yang Li, WenHao Zhang, Jianhong Wang, Shao Zhang, Yali Du, Ying Wen, Wei Pan
The visualization of learning dynamics effectively demonstrates that AgA successfully achieves alignment between individual and collective objectives.
no code implementations • 11 Feb 2024 • Xidong Feng, Ziyu Wan, Mengyue Yang, Ziyan Wang, Girish A. Koushik, Yali Du, Ying Wen, Jun Wang
Reinforcement Learning (RL) has shown remarkable abilities in learning policies for decision-making tasks.
1 code implementation • 9 Feb 2024 • Muning Wen, Cheng Deng, Jun Wang, Weinan Zhang, Ying Wen
At the heart of ETPO is our novel per-token soft Bellman update, designed to harmonize the RL process with the principles of language modeling.
1 code implementation • 29 Dec 2023 • Xinyuan Wu, Wentao Dong, Hang Lai, Yong Yu, Ying Wen
Quadruped robots have strong adaptability to extreme environments but may also experience faults.
no code implementations • 21 Dec 2023 • Yuanfu Wang, Chao Yang, Ying Wen, Yu Liu, Yu Qiao
Recent advancements in offline reinforcement learning (RL) have underscored the capabilities of Return-Conditioned Supervised Learning (RCSL), a paradigm that learns the action distribution based on target returns for each state in a supervised manner.
no code implementations • 23 Nov 2023 • Bin Zhang, Hangyu Mao, Jingqing Ruan, Ying Wen, Yang Li, Shao Zhang, Zhiwei Xu, Dapeng Li, Ziyue Li, Rui Zhao, Lijuan Li, Guoliang Fan
The remarkable progress in Large Language Models (LLMs) opens up new avenues for addressing planning and decision-making problems in Multi-Agent Systems (MAS).
1 code implementation • 8 Oct 2023 • Hanjing Wang, Man-Kit Sit, Congjie He, Ying Wen, Weinan Zhang, Jun Wang, Yaodong Yang, Luo Mai
This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers).
no code implementations • 8 Oct 2023 • Xihuai Wang, Shao Zhang, WenHao Zhang, Wentao Dong, Jingxiao Chen, Ying Wen, Weinan Zhang
Current evaluation methods for ZSC capability still need to improve in constructing diverse evaluation partners and comprehensively measuring the ZSC capability.
1 code implementation • 29 Sep 2023 • Xidong Feng, Ziyu Wan, Muning Wen, Stephen Marcus McAleer, Ying Wen, Weinan Zhang, Jun Wang
Empirical results across reasoning, planning, alignment, and decision-making tasks show that TS-LLM outperforms existing approaches and can handle trees with a depth of 64.
no code implementations • 8 Sep 2023 • Yang Li, Cheng Yu, Guangzhi Sun, Weiqin Zu, Zheng Tian, Ying Wen, Wei Pan, Chao Zhang, Jun Wang, Yang Yang, Fanglei Sun
Experimental results on the LibriTTS datasets demonstrate that our proposed models significantly enhance speech synthesis and editing, producing more natural and expressive speech.
no code implementations • 24 Jun 2023 • Muning Wen, Runji Lin, Hanjing Wang, Yaodong Yang, Ying Wen, Luo Mai, Jun Wang, Haifeng Zhang, Weinan Zhang
Transformer architectures have facilitated the development of large-scale and general-purpose sequence models for prediction tasks in natural language processing and computer vision, e. g., GPT-3 and Swin Transformer.
1 code implementation • 5 Jun 2023 • Yang Li, Shao Zhang, Jichen Sun, WenHao Zhang, Yali Du, Ying Wen, Xinbing Wang, Wei Pan
In order to solve cooperative incompatibility in learning and effectively address the problem in the context of ZSC, we introduce the Cooperative Open-ended LEarning (COLE) framework, which formulates open-ended objectives in cooperative games with two players using perspectives of graph theory to evaluate and pinpoint the cooperative capacity of each strategy.
no code implementations • 13 Feb 2023 • Xihuai Wang, Zheng Tian, Ziyu Wan, Ying Wen, Jun Wang, Weinan Zhang
In this paper, we propose the \textbf{A}gent-by-\textbf{a}gent \textbf{P}olicy \textbf{O}ptimization (A2PO) algorithm to improve the sample efficiency and retain the guarantees of monotonic improvement for each agent during training.
1 code implementation • 9 Feb 2023 • Yang Li, Shao Zhang, Jichen Sun, Yali Du, Ying Wen, Xinbing Wang, Wei Pan
However, these approaches can result in a loss of learning and an inability to cooperate with certain strategies within the population, known as cooperative incompatibility.
1 code implementation • CVPR 2023 • Jiafeng Li, Ying Wen, Lianghua He
The proposed SCConv consists of two units: spatial reconstruction unit (SRU) and channel reconstruction unit (CRU).
1 code implementation • 24 Dec 2022 • Ying Wen, Ziyu Wan, Ming Zhou, Shufang Hou, Zhe Cao, Chenyang Le, Jingxiao Chen, Zheng Tian, Weinan Zhang, Jun Wang
The pervasive uncertainty and dynamic nature of real-world environments present significant challenges for the widespread implementation of machine-driven Intelligent Decision-Making (IDM) systems.
1 code implementation • 6 Oct 2022 • Shao Zhang, Yuting Jia, Hui Xu, Dakuo Wang, Toby Jia-Jun Li, Ying Wen, Xinbing Wang, Chenghu Zhou
Constructing a comprehensive, accurate, and useful scientific knowledge base is crucial for human researchers synthesizing scientific knowledge and for enabling Al-driven scientific discovery.
1 code implementation • 30 May 2022 • Muning Wen, Jakub Grudzien Kuba, Runji Lin, Weinan Zhang, Ying Wen, Jun Wang, Yaodong Yang
In this paper, we introduce a novel architecture named Multi-Agent Transformer (MAT) that effectively casts cooperative multi-agent reinforcement learning (MARL) into SM problems wherein the task is to map agents' observation sequence to agents' optimal action sequence.
1 code implementation • 22 May 2022 • Fanglei Sun, Yang Li, Ying Wen, Jingchen Hu, Jun Wang, Yang Yang, Kai Li
The design of MAFENN framework and algorithm are dedicated to enhance the learning capability of the feedfoward DL networks or their variations with the simple data feedback.
1 code implementation • ACL 2022 • Yang Li, Cheng Yu, Guangzhi Sun, Hua Jiang, Fanglei Sun, Weiqin Zu, Ying Wen, Yang Yang, Jun Wang
Modelling prosody variation is critical for synthesizing natural and expressive speech in end-to-end text-to-speech (TTS) systems.
no code implementations • 21 Feb 2022 • Shao Zhang, Yuting Jia, Hui Xu, Ying Wen, Dakuo Wang, Xinbing Wang
Geoscientists, as well as researchers in many fields, need to read a huge amount of literature to locate, extract, and aggregate relevant results and data to enable future research or to build a scientific database, but there is no existing system to support this use case well.
no code implementations • 28 Jan 2022 • Ming Zhou, Jingxiao Chen, Ying Wen, Weinan Zhang, Yaodong Yang, Yong Yu, Jun Wang
Policy Space Response Oracle methods (PSRO) provide a general solution to learn Nash equilibrium in two-player zero-sum games but suffer from two drawbacks: (1) the computation inefficiency due to the need for consistent meta-game evaluation via simulations, and (2) the exploration inefficiency due to finding the best response against a fixed meta-strategy at every epoch.
1 code implementation • 6 Dec 2021 • Linghui Meng, Muning Wen, Yaodong Yang, Chenyang Le, Xiyun Li, Weinan Zhang, Ying Wen, Haifeng Zhang, Jun Wang, Bo Xu
In this paper, we facilitate the research by providing large-scale datasets, and use them to examine the usage of the Decision Transformer in the context of MARL.
1 code implementation • NeurIPS 2021 • Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen Mcaleer, Ying Wen, Jun Wang, Yaodong Yang
When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population.
Multi-agent Reinforcement Learning Vocal Bursts Valence Prediction
1 code implementation • NeurIPS 2021 • Xiangyu Liu, Hangtian Jia, Ying Wen, Yaodong Yang, Yujing Hu, Yingfeng Chen, Changjie Fan, Zhipeng Hu
With this unified diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning.
no code implementations • 29 Sep 2021 • Linghui Meng, Muning Wen, Yaodong Yang, Chenyang Le, Xi yun Li, Haifeng Zhang, Ying Wen, Weinan Zhang, Jun Wang, Bo Xu
Offline reinforcement learning leverages static datasets to learn optimal policies with no necessity to access the environment.
Multi-agent Reinforcement Learning reinforcement-learning +2
7 code implementations • ICLR 2022 • Jakub Grudzien Kuba, Ruiqing Chen, Muning Wen, Ying Wen, Fanglei Sun, Jun Wang, Yaodong Yang
In this paper, we extend the theory of trust region learning to MARL.
1 code implementation • 12 Jun 2021 • Ying Wen, Hui Chen, Yaodong Yang, Zheng Tian, Minne Li, Xu Chen, Jun Wang
Trust region methods are widely applied in single-agent reinforcement learning problems due to their monotonic performance-improvement guarantee at every iteration.
no code implementations • 9 Jun 2021 • Xiangyu Liu, Hangtian Jia, Ying Wen, Yaodong Yang, Yujing Hu, Yingfeng Chen, Changjie Fan, Zhipeng Hu
With this unified diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning.
1 code implementation • 5 Jun 2021 • Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Weinan Zhang, Jun Wang
Our framework is comprised of three key components: (1) a centralized task dispatching model, which supports the self-generated tasks and scalable training with heterogeneous policy combinations; (2) a programming architecture named Actor-Evaluator-Learner, which achieves high parallelism for both training and sampling, and meets the evaluation requirement of auto-curriculum learning; (3) a higher-level abstraction of MARL training paradigms, which enables efficient code reuse and flexible deployments on different distributed computing paradigms.
1 code implementation • 4 Jun 2021 • Xidong Feng, Oliver Slumbers, Ziyu Wan, Bo Liu, Stephen Mcaleer, Ying Wen, Jun Wang, Yaodong Yang
When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population.
3 code implementations • 14 Mar 2021 • Nicolas Perez Nieves, Yaodong Yang, Oliver Slumbers, David Henry Mguni, Ying Wen, Jun Wang
Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e. g., Rock-Paper-Scissors).
no code implementations • 15 Feb 2021 • Yaodong Yang, Jun Luo, Ying Wen, Oliver Slumbers, Daniel Graves, Haitham Bou Ammar, Jun Wang, Matthew E. Taylor
Multiagent reinforcement learning (MARL) has achieved a remarkable amount of success in solving various types of video games.
1 code implementation • 1 Jan 2021 • Ying Wen, Hui Chen, Yaodong Yang, Zheng Tian, Minne Li, Xu Chen, Jun Wang
We derive the lower bound of agents' payoff improvements for MATRL methods, and also prove the convergence of our method on the meta-game fixed points.
3 code implementations • 19 Oct 2020 • Ming Zhou, Jun Luo, Julian Villella, Yaodong Yang, David Rusu, Jiayu Miao, Weinan Zhang, Montgomery Alban, Iman Fadakar, Zheng Chen, Aurora Chongxi Huang, Ying Wen, Kimia Hassanzadeh, Daniel Graves, Dong Chen, Zhengbang Zhu, Nhat Nguyen, Mohamed Elsayed, Kun Shao, Sanjeevan Ahilan, Baokuan Zhang, Jiannan Wu, Zhengang Fu, Kasra Rezaee, Peyman Yadmellat, Mohsen Rohani, Nicolas Perez Nieves, Yihan Ni, Seyedershad Banijamali, Alexander Cowen Rivers, Zheng Tian, Daniel Palenicek, Haitham Bou Ammar, Hongbo Zhang, Wulong Liu, Jianye Hao, Jun Wang
We open-source the SMARTS platform and the associated benchmark tasks and evaluation metrics to encourage and empower research on multi-agent learning for autonomous driving.
no code implementations • 12 Oct 2020 • YuTao Shen, Ying Wen
The performance of convolutional neural networks (CNNs) can be improved by adjusting the interrelationship between channels with attention mechanism.
1 code implementation • ICML 2020 • Yaodong Yang, Ying Wen, Li-Heng Chen, Jun Wang, Kun Shao, David Mguni, Wei-Nan Zhang
Though practical, current methods rely on restrictive assumptions to decompose the centralized value function across agents for execution.
1 code implementation • 21 Nov 2019 • Ying Wen, Kai Xie, Lianghua He
The encoder-decoder networks are commonly used in medical image segmentation due to their remarkable performance in hierarchical feature fusion.
1 code implementation • 17 May 2019 • Zheng Tian, Ying Wen, Zhichen Gong, Faiz Punakkath, Shihao Zou, Jun Wang
In a single-agent setting, reinforcement learning (RL) tasks can be cast into an inference problem by introducing a binary random variable o, which stands for the "optimality".
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 4 Mar 2019 • Minne Li, Zheng Tian, Pranav Nashikkar, Ian Davies, Ying Wen, Jun Wang
Existing model-based reinforcement learning methods often study perception modeling and decision making separately.
no code implementations • 26 Jan 2019 • Ying Wen, Yaodong Yang, Rui Luo, Jun Wang
Though limited in real-world decision making, most multi-agent reinforcement learning (MARL) models assume perfectly rational agents -- a property hardly met due to individual's cognitive limitation and/or the tractability of the decision problem.
no code implementations • ICLR 2019 • Ying Wen, Yaodong Yang, Rui Luo, Jun Wang, Wei Pan
Our methods are tested on both the matrix game and the differential game, which have a non-trivial equilibrium where common gradient-based methods fail to converge.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 11 Sep 2018 • Yong Chen, Ming Zhou, Ying Wen, Yaodong Yang, Yufeng Su, Wei-Nan Zhang, Dell Zhang, Jun Wang, Han Liu
Deep Q-learning has achieved a significant success in single-agent decision making tasks.
Multiagent Systems
no code implementations • 13 Sep 2017 • Yaodong Yang, Lantao Yu, Yiwei Bai, Jun Wang, Wei-Nan Zhang, Ying Wen, Yong Yu
We conduct an empirical study on discovering the ordered collective dynamics obtained by a population of intelligence agents, driven by million-agent reinforcement learning.
no code implementations • 5 Jul 2017 • Haifeng Zhang, Jun Wang, Zhiming Zhou, Wei-Nan Zhang, Ying Wen, Yong Yu, Wenxin Li
In typical reinforcement learning (RL), the environment is assumed given and the goal of the learning is to identify an optimal policy for the agent taking actions through its interactions with the environment.
2 code implementations • 29 Mar 2017 • Peng Peng, Ying Wen, Yaodong Yang, Quan Yuan, Zhenkun Tang, Haitao Long, Jun Wang
Many artificial intelligence (AI) applications often require multiple intelligent agents to work in a collaborative effort.
11 code implementations • 1 Nov 2016 • Yanru Qu, Han Cai, Kan Ren, Wei-Nan Zhang, Yong Yu, Ying Wen, Jun Wang
Predicting user responses, such as clicks and conversions, is of great importance and has found its usage in many Web applications including recommender systems, web search and online advertising.
Ranked #1 on Click-Through Rate Prediction on iPinYou
no code implementations • 22 Jun 2016 • Ying Wen, Wei-Nan Zhang, Rui Luo, Jun Wang
Recently, the rapid development of word embedding and neural networks has brought new inspiration to various NLP and IR tasks.