Search Results for author: Dan Qiao

Found 13 papers, 3 papers with code

Differentially Private Reinforcement Learning with Self-Play

no code implementations11 Apr 2024 Dan Qiao, Yu-Xiang Wang

We study the problem of multi-agent reinforcement learning (multi-agent RL) with differential privacy (DP) constraints.

Multi-agent Reinforcement Learning reinforcement-learning

Near-Optimal Reinforcement Learning with Self-Play under Adaptivity Constraints

no code implementations2 Feb 2024 Dan Qiao, Yu-Xiang Wang

We study the problem of multi-agent reinforcement learning (MARL) with adaptivity constraints -- a new problem motivated by real-world applications where deployments of new policies are costly and the number of policy updates must be minimized.

Multi-agent Reinforcement Learning reinforcement-learning

OpenBA: An Open-sourced 15B Bilingual Asymmetric seq2seq Model Pre-trained from Scratch

1 code implementation19 Sep 2023 Juntao Li, Zecheng Tang, Yuyang Ding, Pinzheng Wang, Pei Guo, Wangjie You, Dan Qiao, Wenliang Chen, Guohong Fu, Qiaoming Zhu, Guodong Zhou, Min Zhang

This report provides the main details to pre-train an analogous model, including pre-training data processing, Bilingual Flan data collection, the empirical observations that inspire our model architecture design, training objectives of different stages, and other enhancement techniques.

GameEval: Evaluating LLMs on Conversational Games

1 code implementation19 Aug 2023 Dan Qiao, Chenfei Wu, Yaobo Liang, Juntao Li, Nan Duan

In this paper, we propose GameEval, a novel approach to evaluating LLMs through goal-driven conversational games, overcoming the limitations of previous methods.

Question Answering

Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data

no code implementations24 Jun 2023 Sunil Madhow, Dan Qiao, Ming Yin, Yu-Xiang Wang

Developing theoretical guarantees on the sample complexity of offline RL methods is an important step towards making data-hungry RL algorithms practically viable.

Offline RL reinforcement-learning

Semantically Aligned Task Decomposition in Multi-Agent Reinforcement Learning

no code implementations18 May 2023 Wenhao Li, Dan Qiao, Baoxiang Wang, Xiangfeng Wang, Bo Jin, Hongyuan Zha

The difficulty of appropriately assigning credit is particularly heightened in cooperative MARL with sparse reward, due to the concurrent time and structural scales involved.

Decision Making Multi-agent Reinforcement Learning +2

Near-Optimal Differentially Private Reinforcement Learning

no code implementations9 Dec 2022 Dan Qiao, Yu-Xiang Wang

We close this gap for the JDP case by designing an $\epsilon$-JDP algorithm with a regret of $\widetilde{O}(\sqrt{SAH^2T}+S^2AH^3/\epsilon)$ which matches the information-theoretic lower bound of non-private learning for all choices of $\epsilon> S^{1. 5}A^{0. 5} H^2/\sqrt{T}$.

reinforcement-learning Reinforcement Learning (RL)

SelfMix: Robust Learning Against Textual Label Noise with Self-Mixup Training

1 code implementation COLING 2022 Dan Qiao, Chenchen Dai, Yuyang Ding, Juntao Li, Qiang Chen, Wenliang Chen, Min Zhang

The conventional success of textual classification relies on annotated data, and the new paradigm of pre-trained language models (PLMs) still requires a few labeled data for downstream tasks.

text-classification Text Classification

Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation

no code implementations3 Oct 2022 Dan Qiao, Yu-Xiang Wang

We study the problem of deployment efficient reinforcement learning (RL) with linear function approximation under the \emph{reward-free} exploration setting.

reinforcement-learning Reinforcement Learning (RL)

Doubly Fair Dynamic Pricing

no code implementations23 Sep 2022 Jianyu Xu, Dan Qiao, Yu-Xiang Wang

We show that a doubly fair policy must be random to have higher revenue than the best trivial policy that assigns the same price to different groups.

Fairness

Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost

no code implementations13 Feb 2022 Dan Qiao, Ming Yin, Ming Min, Yu-Xiang Wang

In this paper, we propose a new algorithm based on stage-wise exploration and adaptive policy elimination that achieves a regret of $\widetilde{O}(\sqrt{H^4S^2AT})$ while requiring a switching cost of $O(HSA \log\log T)$.

reinforcement-learning Reinforcement Learning (RL)

Novel Nussbaum-Type Function based Safe Adaptive Distributed Consensus Control with Arbitrary Unknown Control Direction

no code implementations24 Jan 2022 Dan Qiao, Zhaoxia Peng, Guoguang Wen, TingWen Huang

This paper develops a novel saturated Nussbaum function to relax such limitations and proposes a Nussbaum function based control scheme for the consensus problem of multi-agent systems with arbitrary non-identical unknown control directions and safe control progress.

Cannot find the paper you are looking for? You can Submit a new open access paper.