no code implementations • ICML 2020 • Umer Siddique, Paul Weng, Matthieu Zimmer
During this analysis, we notably derive a new result in the standard RL setting, which is of independent interest: it states a novel bound on the approximation error with respect to the optimal average reward of that of a policy optimal for the discounted reward.
1 code implementation • 19 Feb 2024 • Jianshu Hu, Yunpeng Jiang, Paul Weng
To tackle this question, we analyze existing methods to better understand them and to uncover how they are connected.
1 code implementation • 4 Feb 2024 • Han Fang, Zhihao Song, Paul Weng, Yutong Ban
Recently, deep reinforcement learning has shown promising results for learning fast heuristics to solve routing problems.
no code implementations • 22 Dec 2023 • Timo Kaufmann, Paul Weng, Viktor Bengs, Eyke Hüllermeier
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function.
no code implementations • 16 Mar 2023 • Junqi Qian, Paul Weng, Chenmien Tan
LR4GPM alternates between two phases: (1) learning a (possibly vector) reward function used to fit the performance metric, and (2) training a policy to optimize an approximation of this performance metric based on the learned rewards.
no code implementations • 26 Dec 2021 • Claire Glanois, Xuening Feng, Zhaohui Jiang, Paul Weng, Matthieu Zimmer, Dong Li, Wulong Liu
We propose an efficient interpretable neuro-symbolic model to solve Inductive Logic Programming (ILP) problems.
no code implementations • 24 Dec 2021 • Claire Glanois, Paul Weng, Matthieu Zimmer, Dong Li, Tianpei Yang, Jianye Hao, Wulong Liu
To that aim, we distinguish interpretability (as a property of a model) and explainability (as a post-hoc operation, with the intervention of a proxy) and discuss them in the context of RL with an emphasis on the former notion.
no code implementations • 7 Oct 2021 • Wenbin Ouyang, Yisen Wang, Paul Weng, Shaochen Han
Since training on large instances is impractical, we design a novel deep RL approach with a focus on generalizability.
no code implementations • 6 Oct 2021 • Wenbin Ouyang, Yisen Wang, Shaochen Han, Zhejian Jin, Paul Weng
In this work, we propose a novel approach named MAGIC that includes a deep learning architecture and a DRL training method.
no code implementations • 15 Mar 2021 • Zhihao Ma, Yuzheng Zhuang, Paul Weng, Hankz Hankui Zhuo, Dong Li, Wulong Liu, Jianye Hao
To address this challenge and improve the transparency, we propose a Neural Symbolic Reinforcement Learning framework by introducing symbolic logic into DRL.
no code implementations • 26 Feb 2021 • Jianyi Zhang, Paul Weng
Safety in reinforcement learning (RL) is a key property in both training and execution in many domains such as autonomous driving or finance.
no code implementations • 23 Feb 2021 • Matthieu Zimmer, Xuening Feng, Claire Glanois, Zhaohui Jiang, Jianyi Zhang, Paul Weng, Dong Li, Jianye Hao, Wulong Liu
As a step in this direction, we propose a novel neural-logic architecture, called differentiable logic machine (DLM), that can solve both inductive logic programming (ILP) and reinforcement learning (RL) problems, where the solution can be interpreted as a first-order logic program.
no code implementations • 19 Feb 2021 • Ruibin Bai, Xinan Chen, Zhi-Long Chen, Tianxiang Cui, Shuhui Gong, Wentao He, Xiaoping Jiang, Huan Jin, Jiahuan Jin, Graham Kendall, Jiawei Li, Zheng Lu, Jianfeng Ren, Paul Weng, Ning Xue, Huayan Zhang
The Vehicle Routing Problem (VRP) is one of the most intensively studied combinatorial optimisation problems for which numerous models and algorithms have been proposed.
no code implementations • 1 Jan 2021 • Zhihao Ma, Yuzheng Zhuang, Paul Weng, Dong Li, Kun Shao, Wulong Liu, Hankz Hankui Zhuo, Jianye Hao
Recent progress in deep reinforcement learning (DRL) can be largely attributed to the use of neural networks.
Hierarchical Reinforcement Learning reinforcement-learning +2
3 code implementations • 17 Dec 2020 • Matthieu Zimmer, Claire Glanois, Umer Siddique, Paul Weng
As a solution method, we propose a novel neural network architecture, which is composed of two sub-networks specifically designed for taking into account the two aspects of fairness.
2 code implementations • 16 Oct 2020 • Jiancong Huang, Juan Rojas, Matthieu Zimmer, Hongmin Wu, Yisheng Guan, Paul Weng
Insufficient learning (due to convergence to local optima) results in under-performing policies whilst redundant learning wastes time and resources.
1 code implementation • 18 Aug 2020 • Umer Siddique, Paul Weng, Matthieu Zimmer
Since learning with discounted rewards is generally easier, this discussion further justifies finding a fair policy for the average reward by learning a fair policy for the discounted reward.
no code implementations • 29 May 2020 • Olivier Buffet, Olivier Pietquin, Paul Weng
Reinforcement learning (RL) is a general framework for adaptive control, which has proven to be efficient in many domains, e. g., board games, video games or autonomous vehicles.
1 code implementation • 19 Oct 2019 • Yijiong Lin, Jiancong Huang, Matthieu Zimmer, Juan Rojas, Paul Weng
Deep reinforcement learning (DRL) is a promising approach for adaptive robot control, but its current application to robotics is currently hindered by high sample requirements.
1 code implementation • 24 Sep 2019 • Yijiong Lin, Jiancong Huang, Matthieu Zimmer, Yisheng Guan, Juan Rojas, Paul Weng
Our work demonstrates that invariant transformations on RL trajectories are a promising methodology to speed up learning in deep RL.
no code implementations • 24 Jul 2019 • Paul Weng
Decision support systems (e. g., for ecological conservation) and autonomous systems (e. g., adaptive controllers in smart cities) start to be deployed in real applications.
1 code implementation • 10 Jun 2019 • Matthieu Zimmer, Paul Weng
In the context of learning deterministic policies in continuous domains, we revisit an approach, which was first proposed in Continuous Actor Critic Learning Automaton (CACLA) and later extended in Neural Fitted Actor Critic (NFAC).
1 code implementation • 25 Mar 2019 • Qitian Wu, Hengrui Zhang, Xiaofeng Gao, Peng He, Paul Weng, Han Gao, Guihai Chen
Social recommendation leverages social information to solve data sparsity and cold-start problems in traditional collaborative filtering methods.
Ranked #1 on Recommendation Systems on WeChat
no code implementations • ICML 2017 • Robert Busa-Fekete, Balazs Szorenyi, Paul Weng, Shie Mannor
We study the multi-armed bandit (MAB) problem where the agent receives a vectorial feedback that encodes many possibly competing objectives to be optimized.
no code implementations • 3 Jan 2017 • Dajian Li, Paul Weng, Orkun Karabasoglu
We also present a case study of our algorithm on the Manhattan, NYC, transportation network.
no code implementations • 3 Jan 2017 • Paul Weng
In this paper, we present a link between preference-based and multiobjective sequential decision-making.
no code implementations • 1 Dec 2016 • Hugo Gilbert, Paul Weng, Yan Xu
In the Markov decision process model, policies are usually evaluated by expected cumulative rewards.
no code implementations • 3 Nov 2016 • Hugo Gilbert, Paul Weng
In reinforcement learning, the standard criterion to evaluate policies in a state is the expectation of (discounted) sum of rewards.
no code implementations • 26 Sep 2013 • Patrice Perny, Paul Weng, Judy Goldsmith, Josiah Hanna
This paper is devoted to fair optimization in Multiobjective Markov Decision Processes (MOMDPs).