no code implementations • 23 Oct 2023 • Runzhe Wu, Wen Sun
Reinforcement Learning algorithms that learn from human feedback (RLHF) need to be efficient in terms of statistical complexity, computational complexity, and query complexity.
no code implementations • 24 Jul 2023 • Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu
We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward.
no code implementations • NeurIPS 2023 • Kaiwen Wang, Kevin Zhou, Runzhe Wu, Nathan Kallus, Wen Sun
In online RL, we propose a DistRL algorithm that constructs confidence sets using maximum likelihood estimation.
1 code implementation • 19 Feb 2023 • Runzhe Wu, Masatoshi Uehara, Wen Sun
Our theoretical results show that for both finite-horizon and infinite-horizon discounted settings, FLE can learn distributions that are close to the ground truth under total variation distance and Wasserstein distance, respectively.
no code implementations • NeurIPS 2021 • Runzhe Wu, Yufeng Zhang, Zhuoran Yang, Zhaoran Wang
In constrained multi-objective RL, the goal is to learn a policy that achieves the best performance specified by a multi-objective preference function under a constraint.
Multi-Objective Reinforcement Learning reinforcement-learning
1 code implementation • 5 Jun 2021 • Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Weinan Zhang, Jun Wang
Our framework is comprised of three key components: (1) a centralized task dispatching model, which supports the self-generated tasks and scalable training with heterogeneous policy combinations; (2) a programming architecture named Actor-Evaluator-Learner, which achieves high parallelism for both training and sampling, and meets the evaluation requirement of auto-curriculum learning; (3) a higher-level abstraction of MARL training paradigms, which enables efficient code reuse and flexible deployments on different distributed computing paradigms.