Search Results for author: Han Zhong

Found 25 papers, 4 papers with code

Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation

no code implementations • 19 Apr 2024 • Jianliang He, Han Zhong, Zhuoran Yang

Moreover, for AMDPs, we propose a novel complexity measure -- average-reward generalized eluder coefficient (AGEC) -- which captures the challenge of exploration in AMDPs with general function approximation.

Paper
Add Code

Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm

no code implementations • 4 Apr 2024 • Miao Lu, Han Zhong, Tong Zhang, Jose Blanchet

Unlike previous work, which relies on a generative model or a pre-collected offline dataset enjoying good coverage of the deployment environment, we tackle robust RL via interactive data collection, where the learner interacts with the training environment only and refines the policy through trial and error.

Reinforcement Learning (RL)

Paper
Add Code

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

1 code implementation • 15 Feb 2024 • Rui Yang, Xiaoman Pan, Feng Luo, Shuang Qiu, Han Zhong, Dong Yu, Jianshu Chen

We consider the problem of multi-objective alignment of foundation models with human preferences, which is a critical step towards helpful and harmless AI systems.

Reinforcement Learning (RL)

Paper
Code

Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity

no code implementations • 28 Dec 2023 • Guhao Feng, Han Zhong

We first demonstrate that, for a broad class of Markov decision processes (MDPs), the model can be represented by constant-depth circuits with polynomial size or Multi-Layer Perceptrons (MLPs) with constant layers and polynomial hidden dimension.

Reinforcement Learning (RL)

Paper
Add Code

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint

1 code implementation • 18 Dec 2023 • Wei Xiong, Hanze Dong, Chenlu Ye, Ziqi Wang, Han Zhong, Heng Ji, Nan Jiang, Tong Zhang

This includes an iterative version of the Direct Preference Optimization (DPO) algorithm for online settings, and a multi-step rejection sampling strategy for offline scenarios.

Language Modelling Large Language Model

Paper
Code

Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement Learning with General Function Approximation

no code implementations • 7 Dec 2023 • Jiayi Huang, Han Zhong, LiWei Wang, Lin F. Yang

To tackle long planning horizon problems in reinforcement learning with general function approximation, we propose the first algorithm, termed as UCRL-WVTR, that achieves both \emph{horizon-free} and \emph{instance-dependent}, since it eliminates the polynomial dependency on the planning horizon.

regression

Paper
Add Code

Towards Robust Offline Reinforcement Learning under Diverse Data Corruption

2 code implementations • 19 Oct 2023 • Rui Yang, Han Zhong, Jiawei Xu, Amy Zhang, Chongjie Zhang, Lei Han, Tong Zhang

Offline reinforcement learning (RL) presents a promising approach for learning reinforced policies from offline datasets without the need for costly or unsafe interactions with the environment.

Offline RL Q-Learning +2

Paper
Code

Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds

no code implementations • NeurIPS 2023 • Jiayi Huang, Han Zhong, LiWei Wang, Lin F. Yang

Our algorithm, termed as \textsc{Heavy-LSVI-UCB}, achieves the \emph{first} computationally efficient \emph{instance-dependent} $K$-episode regret of $\tilde{O}(d \sqrt{H \mathcal{U}^*} K^\frac{1}{1+\epsilon} + d \sqrt{H \mathcal{V}^* K})$.

Reinforcement Learning (RL)

Paper
Add Code

Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration

1 code implementation • NeurIPS 2023 • Zhihan Liu, Miao Lu, Wei Xiong, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang

To achieve this, existing sample-efficient online RL algorithms typically consist of three components: estimation, planning, and exploration.

Paper
Code

Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret

no code implementations • 21 Feb 2023 • Han Zhong, Jiachen Hu, Yecheng Xue, Tongyang Li, LiWei Wang

While quantum reinforcement learning (RL) has attracted a surge of attention recently, its theoretical understanding is limited.

Efficient Exploration reinforcement-learning +1

Paper
Add Code

A Reduction-based Framework for Sequential Decision Making with Delayed Feedback

no code implementations • NeurIPS 2023 • Yunchang Yang, Han Zhong, Tianhao Wu, Bin Liu, LiWei Wang, Simon S. Du

We study stochastic delayed feedback in general multi-agent sequential decision making, which includes bandits, single-agent Markov decision processes (MDPs), and Markov games (MGs).

Decision Making

Paper
Add Code

GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond

no code implementations • 3 Nov 2022 • Han Zhong, Wei Xiong, Sirui Zheng, LiWei Wang, Zhaoran Wang, Zhuoran Yang, Tong Zhang

The proposed algorithm modifies the standard posterior sampling algorithm in two aspects: (i) we use an optimistic prior distribution that biases towards hypotheses with higher values and (ii) a loglikelihood function is set to be the empirical loss evaluated on the historical data, where the choice of loss function supports both model-free and model-based learning.

Decision Making Reinforcement Learning (RL)

Paper
Add Code

Provable Sim-to-real Transfer in Continuous Domain with Partial Observations

no code implementations • 27 Oct 2022 • Jiachen Hu, Han Zhong, Chi Jin, LiWei Wang

Sim-to-real transfer trains RL agents in the simulated environments and then deploys them in the real world.

Paper
Add Code

A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games

no code implementations • 4 Oct 2022 • Wei Xiong, Han Zhong, Chengshuai Shi, Cong Shen, Tong Zhang

Existing studies on provably efficient algorithms for Markov games (MGs) almost exclusively build on the "optimism in the face of uncertainty" (OFU) principle.

Paper
Add Code

Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game

no code implementations • 31 May 2022 • Wei Xiong, Han Zhong, Chengshuai Shi, Cong Shen, LiWei Wang, Tong Zhang

We also extend our techniques to the two-player zero-sum Markov games (MGs), and establish a new performance lower bound for MGs, which tightens the existing result, and verifies the nearly minimax optimality of the proposed algorithm.

Offline RL Reinforcement Learning (RL)

Paper
Add Code

Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power

no code implementations • 27 May 2022 • Binghui Li, Jikai Jin, Han Zhong, John E. Hopcroft, LiWei Wang

Moreover, we establish an improved upper bound of $\exp({\mathcal{O}}(k))$ for the network size to achieve low robust generalization error when the data lies on a manifold with intrinsic dimension $k$ ($k \ll d$).

Binary Classification

Paper
Add Code

Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation

no code implementations • 23 May 2022 • Xiaoyu Chen, Han Zhong, Zhuoran Yang, Zhaoran Wang, LiWei Wang

To the best of our knowledge, this is the first theoretical result for PbRL with (general) function approximation.

Reinforcement Learning (RL)

Paper
Add Code

Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets

no code implementations • 15 Feb 2022 • Han Zhong, Wei Xiong, Jiyuan Tan, LiWei Wang, Tong Zhang, Zhaoran Wang, Zhuoran Yang

When the dataset does not have uniform coverage over all policy pairs, finding an approximate NE involves challenges in three aspects: (i) distributional shift between the behavior policy and the optimal policy, (ii) function approximation to handle large state space, and (iii) minimax optimization for equilibrium solving.

Paper
Add Code

Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopic Followers?

no code implementations • 27 Dec 2021 • Han Zhong, Zhuoran Yang, Zhaoran Wang, Michael I. Jordan

We develop sample-efficient reinforcement learning (RL) algorithms for solving for an SNE in both online and offline settings.

Reinforcement Learning (RL)

Paper
Add Code

Nearly Optimal Policy Optimization with Stable at Any Time Guarantee

no code implementations • 21 Dec 2021 • Tianhao Wu, Yunchang Yang, Han Zhong, LiWei Wang, Simon S. Du, Jiantao Jiao

Policy optimization methods are one of the most widely used classes of Reinforcement Learning (RL) algorithms.

4k Reinforcement Learning (RL)

Paper
Add Code

Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs

no code implementations • NeurIPS 2021 • Han Zhong, Jiayi Huang, Lin F. Yang, LiWei Wang

Despite a large amount of effort in dealing with heavy-tailed error in machine learning, little is known when moments of the error can become non-existential: the random noise $\eta$ satisfies Pr$\left[|\eta| > |y|\right] \le 1/|y|^{\alpha}$ for some $\alpha > 0$.

Paper
Add Code

Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs

no code implementations • 18 Oct 2021 • Han Zhong, Zhuoran Yang, Zhaoran Wang, Csaba Szepesvári

We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs).

Reinforcement Learning (RL)

Paper
Add Code

Can Reinforcement Learning Efficiently Find Stackelberg-Nash Equilibria in General-Sum Markov Games?

no code implementations • 29 Sep 2021 • Han Zhong, Zhuoran Yang, Zhaoran Wang, Michael Jordan

To our best knowledge, we establish the first provably efficient RL algorithms for solving SNE in general-sum Markov games with leader-controlled state transitions.

Reinforcement Learning (RL)

Paper
Add Code

A Reduction-Based Framework for Conservative Bandits and Reinforcement Learning

no code implementations • ICLR 2022 • Yunchang Yang, Tianhao Wu, Han Zhong, Evrard Garcelon, Matteo Pirotta, Alessandro Lazaric, LiWei Wang, Simon S. Du

We also obtain a new upper bound for conservative low-rank MDP.

Multi-Armed Bandits reinforcement-learning +1

Paper
Add Code

Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy

no code implementations • 28 Dec 2020 • Han Zhong, Xun Deng, Ethan X. Fang, Zhuoran Yang, Zhaoran Wang, Runze Li

In particular, we focus on a variance-constrained policy optimization problem where the goal is to find a policy that maximizes the expected value of the long-run average reward, subject to a constraint that the long-run variance of the average reward is upper bounded by a threshold.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.