Search Results for author: Qiwei Di

Found 4 papers, 0 papers with code

Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback

no code implementations16 Apr 2024 Qiwei Di, Jiafan He, Quanquan Gu

Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM).

Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path

no code implementations14 Feb 2024 Qiwei Di, Jiafan He, Dongruo Zhou, Quanquan Gu

Our algorithm achieves an $\tilde{\mathcal O}(dB_*\sqrt{K})$ regret bound, where $d$ is the dimension of the feature mapping in the linear transition kernel, $B_*$ is the upper bound of the total cumulative cost for the optimal policy, and $K$ is the number of episodes.

Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning

no code implementations2 Oct 2023 Qiwei Di, Heyang Zhao, Jiafan He, Quanquan Gu

However, limited works on offline RL with non-linear function approximation have instance-dependent regret guarantees.

Offline RL reinforcement-learning +1

Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits

no code implementations2 Oct 2023 Qiwei Di, Tao Jin, Yue Wu, Heyang Zhao, Farzad Farnoud, Quanquan Gu

Dueling bandits is a prominent framework for decision-making involving preferential feedback, a valuable feature that fits various applications involving human interaction, such as ranking, information retrieval, and recommendation systems.

Computational Efficiency Decision Making +2

Cannot find the paper you are looking for? You can Submit a new open access paper.