Search Results for author: Runzhe Wu

Found 6 papers, 2 papers with code

Making RL with Preference-based Feedback Efficient via Randomization

no code implementations • 23 Oct 2023 • Runzhe Wu, Wen Sun

Reinforcement Learning algorithms that learn from human feedback (RLHF) need to be efficient in terms of statistical complexity, computational complexity, and query complexity.

Active Learning Thompson Sampling

Paper
Add Code

Contextual Bandits and Imitation Learning via Preference-Based Active Queries

no code implementations • 24 Jul 2023 • Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu

We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward.

Imitation Learning Multi-Armed Bandits

Paper
Add Code

The Benefits of Being Distributional: Small-Loss Bounds for Reinforcement Learning

no code implementations • NeurIPS 2023 • Kaiwen Wang, Kevin Zhou, Runzhe Wu, Nathan Kallus, Wen Sun

In online RL, we propose a DistRL algorithm that constructs confidence sets using maximum likelihood estimation.

Distributional Reinforcement Learning Offline RL +1

Paper
Add Code

Distributional Offline Policy Evaluation with Predictive Error Guarantees

1 code implementation • 19 Feb 2023 • Runzhe Wu, Masatoshi Uehara, Wen Sun

Our theoretical results show that for both finite-horizon and infinite-horizon discounted settings, FLE can learn distributions that are close to the ground truth under total variation distance and Wasserstein distance, respectively.

Paper
Code

Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration

no code implementations • NeurIPS 2021 • Runzhe Wu, Yufeng Zhang, Zhuoran Yang, Zhaoran Wang

In constrained multi-objective RL, the goal is to learn a policy that achieves the best performance specified by a multi-objective preference function under a constraint.

Multi-Objective Reinforcement Learning reinforcement-learning

Paper
Add Code

MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning

1 code implementation • 5 Jun 2021 • Ming Zhou, Ziyu Wan, Hanjing Wang, Muning Wen, Runzhe Wu, Ying Wen, Yaodong Yang, Weinan Zhang, Jun Wang

Our framework is comprised of three key components: (1) a centralized task dispatching model, which supports the self-generated tasks and scalable training with heterogeneous policy combinations; (2) a programming architecture named Actor-Evaluator-Learner, which achieves high parallelism for both training and sampling, and meets the evaluation requirement of auto-curriculum learning; (3) a higher-level abstraction of MARL training paradigms, which enables efficient code reuse and flexible deployments on different distributed computing paradigms.

Atari Games Distributed Computing +3

461

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.