no code implementations • 1 Mar 2024 • Mingyu Chen, Xuezhou Zhang
This paper initiates the study of scale-free learning in Markov Decision Processes (MDPs), where the scale of rewards/losses is unknown to the learner.
no code implementations • 10 Oct 2023 • Shuoguang Yang, Xuezhou Zhang, Mengdi Wang
Multi-level optimization has gained increasing attention in recent years, as it provides a powerful framework for solving complex optimization problems that arise in many fields, such as meta-learning, multi-player games, reinforcement learning, and nested composition optimization.
no code implementations • 3 Oct 2023 • Mingyu Chen, Xuezhou Zhang
We consider the Adversarial Multi-Armed Bandits (MAB) problem with unbounded losses, where the algorithms have no prior knowledge on the sizes of the losses.
no code implementations • 21 Jun 2023 • Jiacheng Guo, Zihao Li, Huazheng Wang, Mengdi Wang, Zhuoran Yang, Xuezhou Zhang
In this paper, we study representation learning in partially observable Markov Decision Processes (POMDPs), where the agent learns a decoder function that maps a series of high-dimensional raw observations to a compact representation and uses it for more efficient exploration and planning.
1 code implementation • 18 Nov 2022 • Shubham Kumar Bharti, Xuezhou Zhang, Adish Singla, Xiaojin Zhu
Instead, our defense mechanism sanitizes the backdoor policy by projecting observed states to a 'safe subspace', estimated from a small number of interactions with a clean (non-triggered) environment.
no code implementations • 30 Oct 2022 • Chengzhuo Ni, Yuda Song, Xuezhou Zhang, Chi Jin, Mengdi Wang
To our best knowledge, this is the first sample-efficient algorithm for multi-agent general-sum Markov games that incorporates (non-linear) function approximation.
no code implementations • 29 Jun 2022 • Kaixuan Huang, Yu Wu, Xuezhou Zhang, Shenyinying Tu, Qingyun Wu, Mengdi Wang, Huazheng Wang
Online influence maximization aims to maximize the influence spread of a content in a social network with unknown network model by selecting a few seed nodes.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 22 Jun 2022 • Shuoguang Yang, Xuezhou Zhang, Mengdi Wang
This paper studies the problem of distributed bilevel optimization over a network where agents can only communicate with neighbors, including examples from multi-task, multi-agent learning and federated learning.
no code implementations • 5 Jun 2022 • Hui Yuan, Chengzhuo Ni, Huazheng Wang, Xuezhou Zhang, Le Cong, Csaba Szepesvári, Mengdi Wang
We propose a Thompson Sampling-guided Directed Evolution (TS-DE) framework for sequence optimization, where the sequence-to-function mapping is unknown and querying a single value is subject to costly and noisy measurements.
no code implementations • 1 Jun 2022 • Yiding Chen, Xuezhou Zhang, Kaiqing Zhang, Mengdi Wang, Xiaojin Zhu
We consider a distributed reinforcement learning setting where multiple agents separately explore the environment and communicate their experiences through a central server.
1 code implementation • 29 May 2022 • Alekh Agarwal, Yuda Song, Wen Sun, Kaiwen Wang, Mengdi Wang, Xuezhou Zhang
We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation, which is subsequently used to learn a good policy in a \emph{target task}.
no code implementations • 10 Feb 2022 • Ruiqi Zhang, Xuezhou Zhang, Chengzhuo Ni, Mengdi Wang
We approach this problem using the Z-estimation theory and establish the following results: The FQE estimation error is asymptotically normal with explicit variance determined jointly by the tangent space of the function class at the ground truth, the reward structure, and the distribution shift due to off-policy learning; The finite-sample FQE error bound is dominated by the same variance term, and it can also be bounded by function class-dependent divergence, which measures how the off-policy distribution shift intertwines with the function approximator.
no code implementations • 31 Jan 2022 • Chengzhuo Ni, Ruiqi Zhang, Xiang Ji, Xuezhou Zhang, Mengdi Wang
Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy.
1 code implementation • 31 Jan 2022 • Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun
We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i. e., Block MDPs), where rich observations are generated from a set of unknown latent states.
no code implementations • ICLR 2022 • Masatoshi Uehara, Xuezhou Zhang, Wen Sun
This work studies the question of Representation Learning in RL: how can we learn a compact low-dimensional representation such that on top of the representation we can perform RL procedures such as exploration and exploitation, in a sample efficient manner.
no code implementations • 11 Jun 2021 • Xuezhou Zhang, Yiding Chen, Jerry Zhu, Wen Sun
Surprisingly, in this case, the knowledge of $\epsilon$ is necessary, as we show that being adaptive to unknown $\epsilon$ is impossible. This again contrasts with recent results on corruption-robust online RL and implies that robust offline RL is a strictly harder problem.
no code implementations • 23 Feb 2021 • Huajie Shao, Jun Wang, Haohong Lin, Xuezhou Zhang, Aston Zhang, Heng Ji, Tarek Abdelzaher
The algorithm is injected into a Conditional Variational Autoencoder (CVAE), allowing \textit{Apex} to control both (i) the order of keywords in the generated sentences (conditioned on the input keywords and their order), and (ii) the trade-off between diversity and accuracy.
no code implementations • 16 Feb 2021 • Amin Rakhsha, Xuezhou Zhang, Xiaojin Zhu, Adish Singla
We study black-box reward poisoning attacks against reinforcement learning (RL), in which an adversary aims to manipulate the rewards to mislead a sequence of RL agents with unknown algorithms to learn a nefarious policy in an environment unknown to the adversary a priori.
1 code implementation • 11 Feb 2021 • Xuezhou Zhang, Yiding Chen, Xiaojin Zhu, Wen Sun
Our first result shows that no algorithm can find a better than $O(\epsilon)$-optimal policy under our attack model.
no code implementations • 5 Sep 2020 • Yun-Shiuan Chuang, Xuezhou Zhang, Yuzhe ma, Mark K. Ho, Joseph L. Austerweil, Xiaojin Zhu
To solve the machine teaching optimization problem, we use a deep learning approximation method which simulates learners in the environment and learns to predict how feedback affects the learner's internal states.
no code implementations • 16 Jun 2020 • Xuezhou Zhang, Shubham Kumar Bharti, Yuzhe ma, Adish Singla, Xiaojin Zhu
Our TDim results provide the minimum number of samples needed for reinforcement learning, and we discuss their connections to standard PAC-style RL sample complexity and teaching-by-demonstration sample complexity results.
no code implementations • NeurIPS 2020 • Xuezhou Zhang, Yuzhe ma, Adish Singla
To address these challenges, we propose the \textit{task-agnostic RL} framework: In the exploration phase, the agent first collects trajectories by exploring the MDP without the guidance of a reward function.
no code implementations • L4DC 2020 • Xuezhou Zhang, Xiaojin Zhu, Laurent Lessard
We study data poisoning attacks in the online learning setting, where training data arrive sequentially, and the attacker is eavesdropping the data stream and has the ability to contaminate the current data point to affect the online learning process.
6 code implementations • NeurIPS 2021 • Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Rich Caruana, Geoffrey Hinton
They perform similarly to existing state-of-the-art generalized additive models in accuracy, but are more flexible because they are based on neural nets instead of boosted trees.
no code implementations • ICML 2020 • Xuezhou Zhang, Yuzhe ma, Adish Singla, Xiaojin Zhu
In reward-poisoning attacks against reinforcement learning (RL), an attacker can perturb the environment reward $r_t$ into $r_t+\delta_t$ at each step, with the goal of forcing the RL agent to learn a nefarious policy.
1 code implementation • NeurIPS 2019 • Yuzhe Ma, Xuezhou Zhang, Wen Sun, Xiaojin Zhu
We study a security threat to batch reinforcement learning and control where the attacker aims to poison the learned policy.
no code implementations • 5 Mar 2019 • Xuezhou Zhang, Xiaojin Zhu, Laurent Lessard
We study data poisoning attacks in the online setting where training items arrive sequentially, and the attacker may perturb the current item to manipulate online learning.
1 code implementation • 22 Oct 2018 • Xuezhou Zhang, Sarah Tan, Paul Koch, Yin Lou, Urszula Chajewska, Rich Caruana
In the first part of this paper, we generalize a state-of-the-art GAM learning algorithm based on boosted trees to the multiclass setting, and show that this multiclass algorithm outperforms existing GAM learning algorithms and sometimes matches the performance of full complexity models such as gradient boosted trees.
no code implementations • 15 Oct 2018 • Laurent Lessard, Xuezhou Zhang, Xiaojin Zhu
Our key insight is to formulate sequential machine teaching as a time-optimal control problem.
no code implementations • 25 Feb 2018 • Yuzhe Ma, Robert Nowak, Philippe Rigollet, Xuezhou Zhang, Xiaojin Zhu
We call a learner super-teachable if a teacher can trim down an iid training set while making the learner learn even better.
no code implementations • 24 Jan 2018 • Xuezhou Zhang, Xiaojin Zhu, Stephen J. Wright
The set of trusted items may not by itself be adequate for learning, so we propose an algorithm that uses these items to identify bugs in the training set and thus im- proves learning.