Search Results for author: Runzhe Wu

Found 6 papers, 2 papers with code

Making RL with Preference-based Feedback Efficient via Randomization

no code implementations23 Oct 2023 Runzhe Wu, Wen Sun

Reinforcement Learning algorithms that learn from human feedback (RLHF) need to be efficient in terms of statistical complexity, computational complexity, and query complexity.

Active Learning Thompson Sampling

Contextual Bandits and Imitation Learning via Preference-Based Active Queries

no code implementations24 Jul 2023 Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu

We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward.

Imitation Learning Multi-Armed Bandits

Distributional Offline Policy Evaluation with Predictive Error Guarantees

1 code implementation19 Feb 2023 Runzhe Wu, Masatoshi Uehara, Wen Sun

Our theoretical results show that for both finite-horizon and infinite-horizon discounted settings, FLE can learn distributions that are close to the ground truth under total variation distance and Wasserstein distance, respectively.

Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration

no code implementations NeurIPS 2021 Runzhe Wu, Yufeng Zhang, Zhuoran Yang, Zhaoran Wang

In constrained multi-objective RL, the goal is to learn a policy that achieves the best performance specified by a multi-objective preference function under a constraint.

Multi-Objective Reinforcement Learning reinforcement-learning

MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning

1 code implementation5 Jun 2021 Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Weinan Zhang, Jun Wang

Our framework is comprised of three key components: (1) a centralized task dispatching model, which supports the self-generated tasks and scalable training with heterogeneous policy combinations; (2) a programming architecture named Actor-Evaluator-Learner, which achieves high parallelism for both training and sampling, and meets the evaluation requirement of auto-curriculum learning; (3) a higher-level abstraction of MARL training paradigms, which enables efficient code reuse and flexible deployments on different distributed computing paradigms.

Atari Games Distributed Computing +3

Cannot find the paper you are looking for? You can Submit a new open access paper.