Search Results for author: Jianye Hao

Found 150 papers, 37 papers with code

SheetAgent: A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models

no code implementations6 Mar 2024 Yibin Chen, Yifu Yuan, Zeyu Zhang, Yan Zheng, Jinyi Liu, Fei Ni, Jianye Hao

To bridge the gap with the real-world requirements, we introduce $\textbf{SheetRM}$, a benchmark featuring long-horizon and multi-category tasks with reasoning-dependent manipulation caused by real-life challenges.

Language Modelling Large Language Model

Reinforced In-Context Black-Box Optimization

1 code implementation27 Feb 2024 Lei Song, Chenxiao Gao, Ke Xue, Chenyang Wu, Dong Li, Jianye Hao, Zongzhang Zhang, Chao Qian

In this paper, we propose RIBBO, a method to reinforce-learn a BBO algorithm from offline data in an end-to-end fashion.

In-Context Learning Meta-Learning

MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback and Dynamic Distance Constraint

no code implementations22 Feb 2024 Xinglin Zhou, Yifu Yuan, Shaofu Yang, Jianye Hao

To address the issue, We propose a general hierarchical reinforcement learning framework incorporating human feedback and dynamic distance constraints (MENTOR).

Hierarchical Reinforcement Learning reinforcement-learning

Enhancing Robotic Manipulation with AI Feedback from Multimodal Large Language Models

no code implementations22 Feb 2024 Jinyi Liu, Yifu Yuan, Jianye Hao, Fei Ni, Lingzhi Fu, Yibin Chen, Yan Zheng

Recently, there has been considerable attention towards leveraging large language models (LLMs) to enhance decision-making processes.

Decision Making Robot Manipulation

DiffuserLite: Towards Real-time Diffusion Planning

no code implementations27 Jan 2024 Zibin Dong, Jianye Hao, Yifu Yuan, Fei Ni, Yitian Wang, Pengyi Li, Yan Zheng

Diffusion planning has been recognized as an effective decision-making paradigm in various domains.

D4RL Decision Making

Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey

1 code implementation22 Jan 2024 Pengyi Li, Jianye Hao, Hongyao Tang, Xian Fu, Yan Zheng, Ke Tang

Specifically, we systematically summarize recent advancements in relevant algorithms and identify three primary research directions: EA-assisted optimization of RL, RL-assisted optimization of EA, and synergistic optimization of EA and RL.

Evolutionary Algorithms reinforcement-learning +1

Machine Learning Insides OptVerse AI Solver: Design Principles and Applications

no code implementations11 Jan 2024 Xijun Li, Fangzhou Zhu, Hui-Ling Zhen, Weilin Luo, Meng Lu, Yimin Huang, Zhenan Fan, Zirui Zhou, Yufei Kuang, Zhihai Wang, Zijie Geng, Yang Li, Haoyang Liu, Zhiwu An, Muming Yang, Jianshu Li, Jie Wang, Junchi Yan, Defeng Sun, Tao Zhong, Yong Zhang, Jia Zeng, Mingxuan Yuan, Jianye Hao, Jun Yao, Kun Mao

To this end, we present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI Solver, which aims to mitigate the scarcity of real-world mathematical programming instances, and to surpass the capabilities of traditional optimization techniques.

Decision Making Management

LLM4EDA: Emerging Progress in Large Language Models for Electronic Design Automation

1 code implementation28 Dec 2023 RuiZhe Zhong, Xingbo Du, Shixiong Kai, Zhentao Tang, Siyuan Xu, Hui-Ling Zhen, Jianye Hao, Qiang Xu, Mingxuan Yuan, Junchi Yan

Since circuit can be represented with HDL in a textual format, it is reasonable to question whether LLMs can be leveraged in the EDA field to achieve fully automated chip design and generate circuits with improved power, performance, and area (PPA).

Answer Generation Chatbot

OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments

no code implementations19 Dec 2023 Jinyi Liu, Zhi Wang, Yan Zheng, Jianye Hao, Chenjia Bai, Junjie Ye, Zhen Wang, Haiyin Piao, Yang Sun

In reinforcement learning, the optimism in the face of uncertainty (OFU) is a mainstream principle for directing exploration towards less explored areas, characterized by higher uncertainty.

Continuous Control

Rethinking Decision Transformer via Hierarchical Reinforcement Learning

no code implementations1 Nov 2023 Yi Ma, Chenjun Xiao, Hebin Liang, Jianye Hao

Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL).

Decision Making Hierarchical Reinforcement Learning +3

AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable Diffusion Model

no code implementations3 Oct 2023 Zibin Dong, Yifu Yuan, Jianye Hao, Fei Ni, Yao Mu, Yan Zheng, Yujing Hu, Tangjie Lv, Changjie Fan, Zhipeng Hu

Aligning agent behaviors with diverse human preferences remains a challenging problem in reinforcement learning (RL), owing to the inherent abstractness and mutability of human preferences.

Attribute Reinforcement Learning (RL)

HarmonyDream: Task Harmonization Inside World Models

no code implementations30 Sep 2023 Haoyu Ma, Jialong Wu, Ningya Feng, Chenjun Xiao, Dong Li, Jianye Hao, Jianmin Wang, Mingsheng Long

Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning by utilizing a world model, which models how the environment works and typically encompasses components for two tasks: observation modeling and reward modeling.

Atari Games 100k Model-based Reinforcement Learning +1

A Circuit Domain Generalization Framework for Efficient Logic Synthesis in Chip Design

1 code implementation22 Aug 2023 Zhihai Wang, Lei Chen, Jie Wang, Xing Li, Yinqi Bai, Xijun Li, Mingxuan Yuan, Jianye Hao, Yongdong Zhang, Feng Wu

In particular, we notice that the runtime of the Resub and Mfs2 operators often dominates the overall runtime of LS optimization processes.

Domain Generalization

Uncertainty-aware Consistency Learning for Cold-Start Item Recommendation

no code implementations7 Aug 2023 Taichi Liu, Chen Gao, Zhenyu Wang, Dong Li, Jianye Hao, Depeng Jin, Yong Li

Graph Neural Network (GNN)-based models have become the mainstream approach for recommender systems.

Recommendation Systems

BiERL: A Meta Evolutionary Reinforcement Learning Framework via Bilevel Optimization

1 code implementation1 Aug 2023 Junyi Wang, Yuanyang Zhu, Zhi Wang, Yan Zheng, Jianye Hao, Chunlin Chen

Evolutionary reinforcement learning (ERL) algorithms recently raise attention in tackling complex reinforcement learning (RL) problems due to high parallelism, while they are prone to insufficient exploration or model collapse without carefully tuning hyperparameters (aka meta-parameters).

Bilevel Optimization reinforcement-learning +1

Exploiting Counter-Examples for Active Learning with Partial labels

no code implementations14 Jul 2023 Fei Zhang, Yunjie Ye, Lei Feng, Zhongwen Rao, Jieming Zhu, Marcus Kalander, Chen Gong, Jianye Hao, Bo Han

In this setting, an oracle annotates the query samples with partial labels, relaxing the oracle from the demanding accurate labeling process.

Active Learning

Prioritized Trajectory Replay: A Replay Memory for Data-driven Reinforcement Learning

no code implementations27 Jun 2023 Jinyi Liu, Yi Ma, Jianye Hao, Yujing Hu, Yan Zheng, Tangjie Lv, Changjie Fan

In summary, our research emphasizes the significance of trajectory-based data sampling techniques in enhancing the efficiency and performance of offline RL algorithms.

D4RL Offline RL +2

ChiPFormer: Transferable Chip Placement via Offline Decision Transformer

no code implementations26 Jun 2023 Yao Lai, Jinxin Liu, Zhentao Tang, Bin Wang, Jianye Hao, Ping Luo

To resolve these challenges, we cast the chip placement as an offline RL formulation and present ChiPFormer that enables learning a transferable placement policy from fixed offline data.

Offline RL Reinforcement Learning (RL)

Improving Offline-to-Online Reinforcement Learning with Q-Ensembles

no code implementations12 Jun 2023 Kai Zhao, Yi Ma, Jianye Hao, Jinyi Liu, Yan Zheng, Zhaopeng Meng

Offline reinforcement learning (RL) is a learning paradigm where an agent learns from a fixed dataset of experience.

Offline RL reinforcement-learning +1

Iteratively Refined Behavior Regularization for Offline Reinforcement Learning

no code implementations9 Jun 2023 Xiaohan Hu, Yi Ma, Chenjun Xiao, Yan Zheng, Jianye Hao

One of the fundamental challenges for offline reinforcement learning (RL) is ensuring robustness to data distribution.

D4RL Offline RL +2

MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL

no code implementations31 May 2023 Fei Ni, Jianye Hao, Yao Mu, Yifu Yuan, Yan Zheng, Bin Wang, Zhixuan Liang

Recently, diffusion model shines as a promising backbone for the sequence modeling paradigm in offline reinforcement learning(RL).

Reinforcement Learning (RL)

GFlowNets with Human Feedback

no code implementations11 May 2023 Yinchuan Li, Shuang Luo, Yunfeng Shao, Jianye Hao

We propose the GFlowNets with Human Feedback (GFlowHF) framework to improve the exploration ability when training AI models.

Generalized Universal Domain Adaptation with Generative Flow Networks

no code implementations8 May 2023 Didi Zhu, Yinchuan Li, Yunfeng Shao, Jianye Hao, Fei Wu, Kun Kuang, Jun Xiao, Chao Wu

We introduce a new problem in unsupervised domain adaptation, termed as Generalized Universal Domain Adaptation (GUDA), which aims to achieve precise prediction of all target labels including unknown categories.

Universal Domain Adaptation Unsupervised Domain Adaptation

Structure Aware Incremental Learning with Personalized Imitation Weights for Recommender Systems

no code implementations2 May 2023 Yuening Wang, Yingxue Zhang, Antonios Valkanas, Ruiming Tang, Chen Ma, Jianye Hao, Mark Coates

In contrast, for users who have static preferences, model performance can benefit greatly from preserving as much of the user's long-term preferences as possible.

Incremental Learning Knowledge Distillation +1

Generative Flow Networks for Precise Reward-Oriented Active Learning on Graphs

no code implementations24 Apr 2023 Yinchuan Li, Zhigang Li, Wenqian Li, Yunfeng Shao, Yan Zheng, Jianye Hao

Many score-based active learning methods have been successfully applied to graph-structured data, aiming to reduce the number of labels and achieve better performance of graph neural networks based on predefined score functions.

Active Learning

Multi-agent Policy Reciprocity with Theoretical Guarantee

no code implementations12 Apr 2023 Haozhi Wang, Yinchuan Li, Qing Wang, Yunfeng Shao, Jianye Hao

We then define an adjacency space for mismatched states and design a plug-and-play module for value iteration, which enables agents to infer more precise returns.

Continuous Control Multi-agent Reinforcement Learning +1

Traj-MAE: Masked Autoencoders for Trajectory Prediction

no code implementations ICCV 2023 Hao Chen, Jiaze Wang, Kun Shao, Furui Liu, Jianye Hao, Chenyong Guan, Guangyong Chen, Pheng-Ann Heng

Specifically, our Traj-MAE employs diverse masking strategies to pre-train the trajectory encoder and map encoder, allowing for the capture of social and temporal information among agents while leveraging the effect of environment from multiple granularities.

Autonomous Driving Trajectory Prediction

Out-of-distribution Detection with Implicit Outlier Transformation

1 code implementation9 Mar 2023 Qizhou Wang, Junjie Ye, Feng Liu, Quanyu Dai, Marcus Kalander, Tongliang Liu, Jianye Hao, Bo Han

It leads to a min-max learning scheme -- searching to synthesize OOD data that leads to worst judgments and learning from such OOD data for uniform performance in OOD detection.

Out-of-Distribution Detection

CFlowNets: Continuous Control with Generative Flow Networks

no code implementations4 Mar 2023 Yinchuan Li, Shuang Luo, Haozhi Wang, Jianye Hao

Generative flow networks (GFlowNets), as an emerging technique, can be used as an alternative to reinforcement learning for exploratory control tasks.

Active Learning Continuous Control +2

DAG Matters! GFlowNets Enhanced Explainer For Graph Neural Networks

1 code implementation4 Mar 2023 Wenqian Li, Yinchuan Li, Zhigang Li, Jianye Hao, Yan Pang

Uncovering rationales behind predictions of graph neural networks (GNNs) has received increasing attention over the years.

Combinatorial Optimization

The Ladder in Chaos: A Simple and Effective Improvement to General DRL Algorithms by Policy Path Trimming and Boosting

no code implementations2 Mar 2023 Hongyao Tang, Min Zhang, Jianye Hao

On typical MuJoCo and DeepMind Control Suite (DMC) benchmarks, we find common phenomena for TD3 and RAD agents: 1) the activity of policy network parameters is highly asymmetric and policy networks advance monotonically along very few major parameter directions; 2) severe detours occur in parameter update and harmonic-like changes are observed for all minor parameter directions.

Reinforcement Learning (RL)

Spectral Augmentations for Graph Contrastive Learning

no code implementations6 Feb 2023 Amur Ghose, Yingxue Zhang, Jianye Hao, Mark Coates

Contrastive learning has emerged as a premier method for learning representations with or without supervision.

Contrastive Learning Graph Embedding +1

Reweighted Interacting Langevin Diffusions: an Accelerated Sampling Methodfor Optimization

no code implementations30 Jan 2023 Junlong Lyu, Zhitang Chen, Wenlong Lyu, Jianye Hao

We proposed a new technique to accelerate sampling methods for solving difficult optimization problems.

Plan To Predict: Learning an Uncertainty-Foreseeing Model for Model-Based Reinforcement Learning

1 code implementation20 Jan 2023 Zifan Wu, Chao Yu, Chen Chen, Jianye Hao, Hankz Hankui Zhuo

In Model-based Reinforcement Learning (MBRL), model learning is critical since an inaccurate model can bias policy learning via generating misleading samples.

Decision Making Model-based Reinforcement Learning

Planning Immediate Landmarks of Targets for Model-Free Skill Transfer across Agents

no code implementations18 Dec 2022 Minghuan Liu, Zhengbang Zhu, Menghui Zhu, Yuzheng Zhuang, Weinan Zhang, Jianye Hao

In reinforcement learning applications like robotics, agents usually need to deal with various input/output features when specified with different state/action spaces by their developers or physical restrictions.

State-Aware Proximal Pessimistic Algorithms for Offline Reinforcement Learning

no code implementations28 Nov 2022 Chen Chen, Hongyao Tang, Yi Ma, Chao Wang, Qianli Shen, Dong Li, Jianye Hao

The key idea of SA-PP is leveraging discounted stationary state distribution ratios between the learning policy and the offline dataset to modulate the degree of behavior regularization in a state-wise manner, so that pessimism can be implemented in a more appropriate way.

Offline RL Q-Learning +2

Prototypical context-aware dynamics generalization for high-dimensional model-based reinforcement learning

no code implementations23 Nov 2022 Junjie Wang, Yao Mu, Dong Li, Qichao Zhang, Dongbin Zhao, Yuzheng Zhuang, Ping Luo, Bin Wang, Jianye Hao

The latent world model provides a promising way to learn policies in a compact latent space for tasks with high-dimensional observations, however, its generalization across diverse environments with unseen dynamics remains challenging.

Model-based Reinforcement Learning reinforcement-learning +1

ERL-Re$^2$: Efficient Evolutionary Reinforcement Learning with Shared State Representation and Individual Policy Representation

1 code implementation26 Oct 2022 Jianye Hao, Pengyi Li, Hongyao Tang, Yan Zheng, Xian Fu, Zhaopeng Meng

The state representation conveys expressive common features of the environment learned by all the agents collectively; the linear policy representation provides a favorable space for efficient policy optimization, where novel behavior-level crossover and mutation operations can be performed.

Continuous Control Evolutionary Algorithms +2

Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning

1 code implementation9 Oct 2022 Yao Mu, Yuzheng Zhuang, Fei Ni, Bin Wang, Jianyu Chen, Jianye Hao, Ping Luo

This paper addresses such a challenge by Decomposed Mutual INformation Optimization (DOMINO) for context learning, which explicitly learns a disentangled context to maximize the mutual information between the context and historical trajectories, while minimizing the state transition prediction error.

Decision Making Meta Reinforcement Learning +2

EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-choice Dynamics Model

no code implementations2 Oct 2022 Yifu Yuan, Jianye Hao, Fei Ni, Yao Mu, Yan Zheng, Yujing Hu, Jinyi Liu, Yingfeng Chen, Changjie Fan

Unsupervised reinforcement learning (URL) poses a promising paradigm to learn useful behaviors in a task-agnostic environment without the guidance of extrinsic rewards to facilitate the fast adaptation of various downstream tasks.

reinforcement-learning Reinforcement Learning (RL) +2

On the Convergence Theory of Meta Reinforcement Learning with Personalized Policies

no code implementations21 Sep 2022 Haozhi Wang, Qing Wang, Yunfeng Shao, Dong Li, Jianye Hao, Yinchuan Li

Modern meta-reinforcement learning (Meta-RL) methods are mainly developed based on model-agnostic meta-learning, which performs policy gradient steps across tasks to maximize policy performance.

Continuous Control Meta-Learning +3

Mutual Harmony: Sequential Recommendation with Dual Contrastive Network

1 code implementation18 Sep 2022 GuanYu Lin, Chen Gao, Yinfeng Li, Yu Zheng, Zhiheng Li, Depeng Jin, Dong Li, Jianye Hao, Yong Li

Such user-centric recommendation will make it impossible for the provider to expose their new items, failing to consider the accordant interactions between user and item dimensions.

Contrastive Learning Representation Learning +1

Towards A Unified Policy Abstraction Theory and Representation Learning Approach in Markov Decision Processes

no code implementations16 Sep 2022 Min Zhang, Hongyao Tang, Jianye Hao, Yan Zheng

First, we propose a unified policy abstraction theory, containing three types of policy abstraction associated to policy features at different levels.

Decision Making Metric Learning +2

GALOIS: Boosting Deep Reinforcement Learning via Generalizable Logic Synthesis

no code implementations27 May 2022 Yushi Cao, Zhiming Li, Tianpei Yang, Hao Zhang, Yan Zheng, Yi Li, Jianye Hao, Yang Liu

In this paper, we combine the above two paradigms together and propose a novel Generalizable Logic Synthesis (GALOIS) framework to synthesize hierarchical and strict cause-effect logic programs.

Decision Making Program Synthesis +2

A Benchmark for Automatic Medical Consultation System: Frameworks, Tasks and Datasets

1 code implementation19 Apr 2022 Wei Chen, Zhiwei Li, Hongyi Fang, Qianyuan Yao, Cheng Zhong, Jianye Hao, Qi Zhang, Xuanjing Huang, Jiajie Peng, Zhongyu Wei

In recent years, interest has arisen in using machine learning to improve the efficiency of automatic medical consultation and enhance patient experience.

Dialogue Act Classification Dialogue Understanding +4

PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations

no code implementations6 Apr 2022 Tong Sang, Hongyao Tang, Yi Ma, Jianye Hao, Yan Zheng, Zhaopeng Meng, Boyan Li, Zhen Wang

In online adaptation phase, with the environment context inferred from few experiences collected in new environments, the policy is optimized by gradient ascent with respect to the PDVF.

Contrastive Learning Decision Making

PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration

1 code implementation16 Mar 2022 Pengyi Li, Hongyao Tang, Tianpei Yang, Xiaotian Hao, Tong Sang, Yan Zheng, Jianye Hao, Matthew E. Taylor, Wenyuan Tao, Zhen Wang, Fazl Barez

However, we reveal sub-optimal collaborative behaviors also emerge with strong correlations, and simply maximizing the MI can, surprisingly, hinder the learning towards better collaboration.

Multi-agent Reinforcement Learning reinforcement-learning +1

Breaking the Curse of Dimensionality in Multiagent State Space: A Unified Agent Permutation Framework

no code implementations10 Mar 2022 Xiaotian Hao, Hangyu Mao, Weixun Wang, Yaodong Yang, Dong Li, Yan Zheng, Zhen Wang, Jianye Hao

To break this curse, we propose a unified agent permutation framework that exploits the permutation invariance (PI) and permutation equivariance (PE) inductive biases to reduce the multiagent state space.

Data Augmentation Reinforcement Learning (RL) +1

Plan Your Target and Learn Your Skills: Transferable State-Only Imitation Learning via Decoupled Policy Optimization

2 code implementations4 Mar 2022 Minghuan Liu, Zhengbang Zhu, Yuzheng Zhuang, Weinan Zhang, Jianye Hao, Yong Yu, Jun Wang

Recent progress in state-only imitation learning extends the scope of applicability of imitation learning to real-world settings by relieving the need for observing expert actions.

Imitation Learning Transfer Learning

Generalizable Information Theoretic Causal Representation

no code implementations17 Feb 2022 Mengyue Yang, Xinyu Cai, Furui Liu, Xu Chen, Zhitang Chen, Jianye Hao, Jun Wang

It is evidence that representation learning can improve model's performance over multiple downstream tasks in many real-world scenarios, such as image classification and recommender systems.

counterfactual Image Classification +2

Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization

no code implementations9 Feb 2022 Jian Zhao, Yue Zhang, Xunhan Hu, Weixun Wang, Wengang Zhou, Jianye Hao, Jiangcheng Zhu, Houqiang Li

In cooperative multi-agent systems, agents jointly take actions and receive a team reward instead of individual rewards.

Debiased Recommendation with User Feature Balancing

no code implementations16 Jan 2022 Mengyue Yang, Guohao Cai, Furui Liu, Zhenhua Dong, Xiuqiang He, Jianye Hao, Jun Wang, Xu Chen

To alleviate these problems, in this paper, we propose a novel debiased recommendation framework based on user feature balancing.

Causal Inference Recommendation Systems

A Survey on Interpretable Reinforcement Learning

no code implementations24 Dec 2021 Claire Glanois, Paul Weng, Matthieu Zimmer, Dong Li, Tianpei Yang, Jianye Hao, Wulong Liu

To that aim, we distinguish interpretability (as a property of a model) and explainability (as a post-hoc operation, with the intervention of a proxy) and discuss them in the context of RL with an emphasis on the former notion.

Autonomous Driving Decision Making +2

ED2: Environment Dynamics Decomposition World Models for Continuous Control

1 code implementation6 Dec 2021 Jianye Hao, Yifu Yuan, Cong Wang, Zhen Wang

Model-based reinforcement learning (MBRL) achieves significant sample efficiency in practice in comparison to model-free RL, but its performance is often limited by the existence of model prediction error.

Continuous Control Model-based Reinforcement Learning

A Hierarchical Reinforcement Learning Based Optimization Framework for Large-scale Dynamic Pickup and Delivery Problems

no code implementations NeurIPS 2021 Yi Ma, Xiaotian Hao, Jianye Hao, Jiawen Lu, Xing Liu, Tong Xialiang, Mingxuan Yuan, Zhigang Li, Jie Tang, Zhaopeng Meng

To address this problem, existing methods partition the overall DPDP into fixed-size sub-problems by caching online generated orders and solve each sub-problem, or on this basis to utilize the predicted future orders to optimize each sub-problem further.

Hierarchical Reinforcement Learning

Learning State Representations via Retracing in Reinforcement Learning

1 code implementation ICLR 2022 Changmin Yu, Dong Li, Jianye Hao, Jun Wang, Neil Burgess

We propose learning via retracing, a novel self-supervised approach for learning the state representation (and the associated dynamics model) for reinforcement learning tasks.

Continuous Control Model-based Reinforcement Learning +3

Uncertainty-aware Low-Rank Q-Matrix Estimation for Deep Reinforcement Learning

no code implementations19 Nov 2021 Tong Sang, Hongyao Tang, Jianye Hao, Yan Zheng, Zhaopeng Meng

Such a reconstruction exploits the underlying structure of value matrix to improve the value approximation, thus leading to a more efficient learning process of value function.

Continuous Control reinforcement-learning +1

Lifelong Reinforcement Learning with Temporal Logic Formulas and Reward Machines

no code implementations18 Nov 2021 Xuejing Zheng, Chao Yu, Chen Chen, Jianye Hao, Hankz Hankui Zhuo

In this paper, we propose Lifelong reinforcement learning with Sequential linear temporal logic formulas and Reward Machines (LSRM), which enables an agent to leverage previously learned knowledge to fasten learning of logically specified tasks.

reinforcement-learning Reinforcement Learning (RL) +1

SEIHAI: A Sample-efficient Hierarchical AI for the MineRL Competition

no code implementations17 Nov 2021 Hangyu Mao, Chao Wang, Xiaotian Hao, Yihuan Mao, Yiming Lu, Chengjie WU, Jianye Hao, Dong Li, Pingzhong Tang

The MineRL competition is designed for the development of reinforcement learning and imitation learning algorithms that can efficiently leverage human demonstrations to drastically reduce the number of environment interactions needed to solve the complex \emph{ObtainDiamond} task with sparse rewards.

Imitation Learning reinforcement-learning +1

Dynamic Bottleneck for Robust Self-Supervised Exploration

1 code implementation NeurIPS 2021 Chenjia Bai, Lingxiao Wang, Lei Han, Animesh Garg, Jianye Hao, Peng Liu, Zhaoran Wang

Exploration methods based on pseudo-count of transitions or curiosity of dynamics have achieved promising results in solving reinforcement learning with sparse rewards.

Flattening Sharpness for Dynamic Gradient Projection Memory Benefits Continual Learning

1 code implementation NeurIPS 2021 Danruo Deng, Guangyong Chen, Jianye Hao, Qiong Wang, Pheng-Ann Heng

The backpropagation networks are notably susceptible to catastrophic forgetting, where networks tend to forget previously learned skills upon learning new ones.

Continual Learning

Ranking Cost: Building An Efficient and Scalable Circuit Routing Planner with Evolution-Based Optimization

1 code implementation8 Oct 2021 Shiyu Huang, Bin Wang, Dong Li, Jianye Hao, Ting Chen, Jun Zhu

In this work, we propose a new algorithm for circuit routing, named Ranking Cost, which innovatively combines search-based methods (i. e., A* algorithm) and learning-based methods (i. e., Evolution Strategies) to form an efficient and trainable router.

Learning Explicit Credit Assignment for Multi-agent Joint Q-learning

no code implementations29 Sep 2021 Hangyu Mao, Jianye Hao, Dong Li, Jun Wang, Weixun Wang, Xiaotian Hao, Bin Wang, Kun Shao, Zhen Xiao, Wulong Liu

In contrast, we formulate an \emph{explicit} credit assignment problem where each agent gives its suggestion about how to weight individual Q-values to explicitly maximize the joint Q-value, besides guaranteeing the Bellman optimality of the joint Q-value.

Q-Learning

Learning Pseudometric-based Action Representations for Offline Reinforcement Learning

no code implementations29 Sep 2021 Pengjie Gu, Mengchen Zhao, Chen Chen, Dong Li, Jianye Hao, Bo An

Offline reinforcement learning is a promising approach for practical applications since it does not require interactions with real-world environments.

Offline RL Recommendation Systems +4

Online Ad Hoc Teamwork under Partial Observability

no code implementations ICLR 2022 Pengjie Gu, Mengchen Zhao, Jianye Hao, Bo An

Autonomous agents often need to work together as a team to accomplish complex cooperative tasks.

Informative Robust Causal Representation for Generalizable Deep Learning

no code implementations29 Sep 2021 Mengyue Yang, Furui Liu, Xu Chen, Zhitang Chen, Jianye Hao, Jun Wang

In many real-world scenarios, such as image classification and recommender systems, it is evidence that representation learning can improve model's performance over multiple downstream tasks.

counterfactual Image Classification +2

Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain

no code implementations14 Sep 2021 Jianye Hao, Tianpei Yang, Hongyao Tang, Chenjia Bai, Jinyi Liu, Zhaopeng Meng, Peng Liu, Zhen Wang

In addition to algorithmic analysis, we provide a comprehensive and unified empirical comparison of different exploration methods for DRL on a set of commonly used benchmarks.

Autonomous Vehicles Efficient Exploration +3

CMML: Contextual Modulation Meta Learning for Cold-Start Recommendation

no code implementations24 Aug 2021 Xidong Feng, Chen Chen, Dong Li, Mengchen Zhao, Jianye Hao, Jun Wang

Meta learning, especially gradient based one, can be adopted to tackle this problem by learning initial parameters of the model and thus allowing fast adaptation to a specific task from limited data examples.

Computational Efficiency Meta-Learning +1

Modeling Scale-free Graphs with Hyperbolic Geometry for Knowledge-aware Recommendation

no code implementations14 Aug 2021 Yankai Chen, Menglin Yang, Yingxue Zhang, Mengchen Zhao, Ziqiao Meng, Jianye Hao, Irwin King

Aiming to alleviate data sparsity and cold-start problems of traditional recommender systems, incorporating knowledge graphs (KGs) to supplement auxiliary information has recently gained considerable attention.

Knowledge-Aware Recommendation Knowledge Graphs

Contrastive ACE: Domain Generalization Through Alignment of Causal Mechanisms

no code implementations2 Jun 2021 Yunqi Wang, Furui Liu, Zhitang Chen, Qing Lian, Shoubo Hu, Jianye Hao, Yik-Chung Wu

Domain generalization aims to learn knowledge invariant across different distributions while semantically meaningful for downstream tasks from multiple source domains, to improve the model's generalization ability on unseen target domains.

Domain Generalization

Cooperative Multi-Agent Transfer Learning with Level-Adaptive Credit Assignment

no code implementations1 Jun 2021 Tianze Zhou, Fubiao Zhang, Kun Shao, Kai Li, Wenhan Huang, Jun Luo, Weixun Wang, Yaodong Yang, Hangyu Mao, Bin Wang, Dong Li, Wulong Liu, Jianye Hao

In addition, we use a novel agent network named Population Invariant agent with Transformer (PIT) to realize the coordination transfer in more varieties of scenarios.

Management Multi-agent Reinforcement Learning +3

Learning to Select Cuts for Efficient Mixed-Integer Programming

no code implementations28 May 2021 Zeren Huang, Kerong Wang, Furui Liu, Hui-Ling Zhen, Weinan Zhang, Mingxuan Yuan, Jianye Hao, Yong Yu, Jun Wang

In the online A/B testing of the product planning problems with more than $10^7$ variables and constraints daily, Cut Ranking has achieved the average speedup ratio of 12. 42% over the production solver without any accuracy loss of solution.

Multiple Instance Learning

Ordering-Based Causal Discovery with Reinforcement Learning

1 code implementation14 May 2021 Xiaoqiang Wang, Yali Du, Shengyu Zhu, Liangjun Ke, Zhitang Chen, Jianye Hao, Jun Wang

It is a long-standing question to discover causal relations among a set of variables in many empirical sciences.

Causal Discovery reinforcement-learning +2

Principled Exploration via Optimistic Bootstrapping and Backward Induction

1 code implementation13 May 2021 Chenjia Bai, Lingxiao Wang, Lei Han, Jianye Hao, Animesh Garg, Peng Liu, Zhaoran Wang

In this paper, we propose a principled exploration method for DRL through Optimistic Bootstrapping and Backward Induction (OB2I).

Efficient Exploration Reinforcement Learning (RL)

An Adversarial Imitation Click Model for Information Retrieval

1 code implementation13 Apr 2021 Xinyi Dai, Jianghao Lin, Weinan Zhang, Shuai Li, Weiwen Liu, Ruiming Tang, Xiuqiang He, Jianye Hao, Jun Wang, Yong Yu

Modern information retrieval systems, including web search, ads placement, and recommender systems, typically rely on learning from user feedback.

Imitation Learning Information Retrieval +2

Learning Symbolic Rules for Interpretable Deep Reinforcement Learning

no code implementations15 Mar 2021 Zhihao Ma, Yuzheng Zhuang, Paul Weng, Hankz Hankui Zhuo, Dong Li, Wulong Liu, Jianye Hao

To address this challenge and improve the transparency, we propose a Neural Symbolic Reinforcement Learning framework by introducing symbolic logic into DRL.

reinforcement-learning Reinforcement Learning (RL)

Addressing Action Oscillations through Learning Policy Inertia

no code implementations3 Mar 2021 Chen Chen, Hongyao Tang, Jianye Hao, Wulong Liu, Zhaopeng Meng

We propose Nested Policy Iteration as a general training algorithm for PIC-augmented policy which ensures monotonically non-decreasing updates under some mild conditions.

Atari Games Autonomous Driving +1

Differentiable Logic Machines

no code implementations23 Feb 2021 Matthieu Zimmer, Xuening Feng, Claire Glanois, Zhaohui Jiang, Jianyi Zhang, Paul Weng, Dong Li, Jianye Hao, Wulong Liu

As a step in this direction, we propose a novel neural-logic architecture, called differentiable logic machine (DLM), that can solve both inductive logic programming (ILP) and reinforcement learning (RL) problems, where the solution can be interpreted as a first-order logic program.

Decision Making Inductive logic programming +1

Approximating Pareto Frontier through Bayesian-optimization-directed Robust Multi-objective Reinforcement Learning

no code implementations1 Jan 2021 Xiangkun He, Jianye Hao, Dong Li, Bin Wang, Wulong Liu

Thirdly, the agent’s learning process is regarded as a black-box, and the comprehensive metric we proposed is computed after each episode of training, then a Bayesian optimization (BO) algorithm is adopted to guide the agent to evolve towards improving the quality of the approximated Pareto frontier.

Bayesian Optimization Multi-Objective Reinforcement Learning +1

Robust Memory Augmentation by Constrained Latent Imagination

no code implementations1 Jan 2021 Yao Mu, Yuzheng Zhuang, Bin Wang, Wulong Liu, Shengbo Eben Li, Jianye Hao

The latent dynamics model summarizes an agent’s high dimensional experiences in a compact way.

Ranking Cost: One-Stage Circuit Routing by Directly Optimizing Global Objective Function

no code implementations1 Jan 2021 Shiyu Huang, Bin Wang, Dong Li, Jianye Hao, Jun Zhu, Ting Chen

In our method, we introduce a new set of variables called cost maps, which can help the A* router to find out proper paths to achieve the global object.

Optimistic Exploration with Backward Bootstrapped Bonus for Deep Reinforcement Learning

no code implementations1 Jan 2021 Chenjia Bai, Lingxiao Wang, Peng Liu, Zhaoran Wang, Jianye Hao, Yingnan Zhao

However, such an approach is challenging in developing practical exploration algorithms for Deep Reinforcement Learning (DRL).

Atari Games Efficient Exploration +3

Robust Multi-Agent Reinforcement Learning Driven by Correlated Equilibrium

no code implementations1 Jan 2021 Yizheng Hu, Kun Shao, Dong Li, Jianye Hao, Wulong Liu, Yaodong Yang, Jun Wang, Zhanxing Zhu

Therefore, to achieve robust CMARL, we introduce novel strategies to encourage agents to learn correlated equilibrium while maximally preserving the convenience of the decentralized execution.

Adversarial Robustness reinforcement-learning +2

MQES: Max-Q Entropy Search for Efficient Exploration in Continuous Reinforcement Learning

no code implementations1 Jan 2021 Jinyi Liu, Zhi Wang, Jianye Hao, Yan Zheng

Recently, the principle of optimism in the face of (aleatoric and epistemic) uncertainty has been utilized to design efficient exploration strategies for Reinforcement Learning (RL).

Efficient Exploration reinforcement-learning +1

What About Inputing Policy in Value Function: Policy Representation and Policy-extended Value Function Approximator

no code implementations NeurIPS 2021 Hongyao Tang, Zhaopeng Meng, Jianye Hao, Chen Chen, Daniel Graves, Dong Li, Changmin Yu, Hangyu Mao, Wulong Liu, Yaodong Yang, Wenyuan Tao, Li Wang

We study Policy-extended Value Function Approximator (PeVFA) in Reinforcement Learning (RL), which extends conventional value function approximator (VFA) to take as input not only the state (and action) but also an explicit policy representation.

Continuous Control Contrastive Learning +3

Event-Triggered Multi-agent Reinforcement Learning with Communication under Limited-bandwidth Constraint

no code implementations10 Oct 2020 Guangzheng Hu, Yuanheng Zhu, Dongbin Zhao, Mengchen Zhao, Jianye Hao

Then the design of the event-triggered strategy is formulated as a constrained Markov decision problem, and reinforcement learning finds the best communication protocol that satisfies the limited bandwidth constraint.

Multiagent Systems

Dynamic Horizon Value Estimation for Model-based Reinforcement Learning

no code implementations21 Sep 2020 Jun-Jie Wang, Qichao Zhang, Dongbin Zhao, Mengchen Zhao, Jianye Hao

Existing model-based value expansion methods typically leverage a world model for value estimation with a fixed rollout horizon to assist policy learning.

Model-based Reinforcement Learning Novelty Detection +2

Triple-GAIL: A Multi-Modal Imitation Learning Framework with Generative Adversarial Nets

no code implementations19 May 2020 Cong Fei, Bin Wang, Yuzheng Zhuang, Zongzhang Zhang, Jianye Hao, Hongbo Zhang, Xuewu Ji, Wulong Liu

Generative adversarial imitation learning (GAIL) has shown promising results by taking advantage of generative adversarial nets, especially in the field of robot learning.

Autonomous Vehicles Data Augmentation +1

Continuous Multiagent Control using Collective Behavior Entropy for Large-Scale Home Energy Management

no code implementations14 May 2020 Jianwen Sun, Yan Zheng, Jianye Hao, Zhaopeng Meng, Yang Liu

With the increasing popularity of electric vehicles, distributed energy generation and storage facilities in smart grid systems, an efficient Demand-Side Management (DSM) is urgent for energy savings and peak loads reduction.

energy management Management

Learning to Accelerate Heuristic Searching for Large-Scale Maximum Weighted b-Matching Problems in Online Advertising

no code implementations9 May 2020 Xiaotian Hao, Junqi Jin, Jianye Hao, Jin Li, Weixun Wang, Yi Ma, Zhenzhe Zheng, Han Li, Jian Xu, Kun Gai

Bipartite b-matching is fundamental in algorithm design, and has been widely applied into economic markets, labor markets, etc.

CausalVAE: Structured Causal Disentanglement in Variational Autoencoder

1 code implementation CVPR 2021 Mengyue Yang, Furui Liu, Zhitang Chen, Xinwei Shen, Jianye Hao, Jun Wang

Learning disentanglement aims at finding a low dimensional representation which consists of multiple explanatory and generative factors of the observational data.

counterfactual Disentanglement

Efficient Deep Reinforcement Learning via Adaptive Policy Transfer

1 code implementation19 Feb 2020 Tianpei Yang, Jianye Hao, Zhaopeng Meng, Zongzhang Zhang, Yujing Hu, Yingfeng Cheng, Changjie Fan, Weixun Wang, Wulong Liu, Zhaodong Wang, Jiajie Peng

Transfer Learning (TL) has shown great potential to accelerate Reinforcement Learning (RL) by leveraging prior knowledge from past learned policies of relevant tasks.

reinforcement-learning Reinforcement Learning (RL) +1

KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge

no code implementations18 Feb 2020 Peng Zhang, Jianye Hao, Weixun Wang, Hongyao Tang, Yi Ma, Yihai Duan, Yan Zheng

Our framework consists of a fuzzy rule controller to represent human knowledge and a refine module to fine-tune suboptimal prior knowledge.

Common Sense Reasoning Continuous Control +2

Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning

no code implementations3 Dec 2019 Hangyu Mao, Wulong Liu, Jianye Hao, Jun Luo, Dong Li, Zhengchao Zhang, Jun Wang, Zhen Xiao

Social psychology and real experiences show that cognitive consistency plays an important role to keep human society in order: if people have a more consistent cognition about their environments, they are more likely to achieve better cooperation.

Multi-agent Reinforcement Learning Q-Learning +2

Multi-Agent Game Abstraction via Graph Attention Neural Network

no code implementations25 Nov 2019 Yong Liu, Weixun Wang, Yujing Hu, Jianye Hao, Xingguo Chen, Yang Gao

Traditional methods attempt to use pre-defined rules to capture the interaction relationship between agents.

Graph Attention Multi-agent Reinforcement Learning

There is Limited Correlation between Coverage and Robustness for Deep Neural Networks

no code implementations14 Nov 2019 Yizhen Dong, Peixin Zhang, Jingyi Wang, Shuang Liu, Jun Sun, Jianye Hao, Xinyu Wang, Li Wang, Jin Song Dong, Dai Ting

In this work, we conduct an empirical study to evaluate the relationship between coverage, robustness and attack/defense metrics for DNN.

Face Recognition Malware Detection

MGHRL: Meta Goal-generation for Hierarchical Reinforcement Learning

no code implementations30 Sep 2019 Haotian Fu, Hongyao Tang, Jianye Hao, Wulong Liu, Chen Chen

Most meta reinforcement learning (meta-RL) methods learn to adapt to new tasks by directly optimizing the parameters of policies over primitive action space.

Hierarchical Reinforcement Learning Meta-Learning +3

Efficient meta reinforcement learning via meta goal generation

no code implementations25 Sep 2019 Haotian Fu, Hongyao Tang, Jianye Hao

Meta reinforcement learning (meta-RL) is able to accelerate the acquisition of new tasks by learning from past experience.

Meta-Learning Meta Reinforcement Learning +2

From Few to More: Large-scale Dynamic Multiagent Curriculum Learning

no code implementations6 Sep 2019 Weixun Wang, Tianpei Yang, Yong liu, Jianye Hao, Xiaotian Hao, Yujing Hu, Yingfeng Chen, Changjie Fan, Yang Gao

In this paper, we design a novel Dynamic Multiagent Curriculum Learning (DyMA-CL) to solve large-scale problems by starting from learning on a multiagent scenario with a small size and progressively increasing the number of agents.

Spectral-based Graph Convolutional Network for Directed Graphs

no code implementations21 Jul 2019 Yi Ma, Jianye Hao, Yaodong Yang, Han Li, Junqi Jin, Guangyong Chen

Our approach can work directly on directed graph data in semi-supervised nodes classification tasks.

Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces

1 code implementation12 Mar 2019 Haotian Fu, Hongyao Tang, Jianye Hao, Zihan Lei, Yingfeng Chen, Changjie Fan

Deep Reinforcement Learning (DRL) has been applied to address a variety of cooperative multi-agent problems with either discrete action spaces or continuous action spaces.

Multi-agent Reinforcement Learning Q-Learning +2

A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents

no code implementations NeurIPS 2018 Yan Zheng, Zhaopeng Meng, Jianye Hao, Zongzhang Zhang, Tianpei Yang, Changjie Fan

In multiagent domains, coping with non-stationary agents that change behaviors from time to time is a challenging problem, where an agent is usually required to be able to quickly detect the other agent's policy during online interaction, and then adapt its own policy accordingly.

Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction

no code implementations25 Sep 2018 Hongyao Tang, Jianye Hao, Tangjie Lv, Yingfeng Chen, Zongzhang Zhang, Hangtian Jia, Chunxu Ren, Yan Zheng, Zhaopeng Meng, Changjie Fan, Li Wang

Besides, we propose a new experience replay mechanism to alleviate the issue of the sparse transitions at the high level of abstraction and the non-stationarity of multiagent learning.

reinforcement-learning Reinforcement Learning (RL)

SCC-rFMQ Learning in Cooperative Markov Games with Continuous Actions

no code implementations18 Sep 2018 Chengwei Zhang, Xiaohong Li, Jianye Hao, Siqi Chen, Karl Tuyls, Zhiyong Feng, Wanli Xue, Rong Chen

Although many reinforcement learning methods have been proposed for learning the optimal solutions in single-agent continuous-action domains, multiagent coordination domains with continuous actions have received relatively few investigations.

reinforcement-learning Reinforcement Learning (RL)

Towards Efficient Detection and Optimal Response against Sophisticated Opponents

no code implementations12 Sep 2018 Tianpei Yang, Zhaopeng Meng, Jianye Hao, Chongjie Zhang, Yan Zheng, Ze Zheng

This paper proposes a novel approach called Bayes-ToMoP which can efficiently detect the strategy of opponents using either stationary or higher-level reasoning strategies.

Multiagent Systems

Learning Adaptive Display Exposure for Real-Time Advertising

no code implementations10 Sep 2018 Weixun Wang, Junqi Jin, Jianye Hao, Chunjie Chen, Chuan Yu, Wei-Nan Zhang, Jun Wang, Xiaotian Hao, Yixi Wang, Han Li, Jian Xu, Kun Gai

In this paper, we investigate the problem of advertising with adaptive exposure: can we dynamically determine the number and positions of ads for each user visit under certain business constraints so that the platform revenue can be increased?

An Optimal Rewiring Strategy for Reinforcement Social Learning in Cooperative Multiagent Systems

no code implementations13 May 2018 Hongyao Tang, Li Wang, Zan Wang, Tim Baarslag, Jianye Hao

Multiagent coordination in cooperative multiagent systems (MASs) has been widely studied in both fixed-agent repeated interaction setting and the static social learning framework.

Falsification of Cyber-Physical Systems Using Deep Reinforcement Learning

no code implementations1 May 2018 Takumi Akazaki, Shuang Liu, Yoriyuki Yamagata, Yihai Duan, Jianye Hao

With the rapid development of software and distributed computing, Cyber-Physical Systems (CPS) are widely adopted in many application areas, e. g., smart grid, autonomous automobile.

Distributed Computing reinforcement-learning +1

SA-IGA: A Multiagent Reinforcement Learning Method Towards Socially Optimal Outcomes

no code implementations8 Mar 2018 Chengwei Zhang, Xiaohong Li, Jianye Hao, Siqi Chen, Karl Tuyls, Wanli Xue

In multiagent environments, the capability of learning is important for an agent to behave appropriately in face of unknown opponents and dynamic environment.

Q-Learning reinforcement-learning +1

Blind Image Denoising via Dependent Dirichlet Process Tree

no code implementations13 Jan 2016 Fengyuan Zhu, Guangyong Chen, Jianye Hao, Pheng-Ann Heng

This paper addresses this problem and proposes a novel blind image denoising algorithm to recover the clean image from noisy one with the unknown noise model.

Image Denoising Variational Inference

Cannot find the paper you are looking for? You can Submit a new open access paper.