no code implementations • 21 Feb 2024 • Yuchen Liang, Peizhong Ju, Yingbin Liang, Ness Shroff
In this paper, we establish the convergence guarantee for substantially larger classes of distributions under discrete-time diffusion models and further improve the convergence rate for distributions with bounded support.
no code implementations • 14 Dec 2023 • Hao Chen, Abhishek Gupta, Yin Sun, Ness Shroff
In particular, we provide performance guarantees for the MMD-CUSUM test under $\alpha$, $\beta$, and $\phi$-mixing processes, which significantly expands its utility beyond the i. i. d.
no code implementations • 4 Oct 2023 • Hao Chen, Abhishek Gupta, Yin Sun, Ness Shroff
This paper studies Hoeffding's inequality for Markov chains under the generalized concentrability condition defined via integral probability metric (IPM).
no code implementations • 22 Jun 2023 • Yining Li, Peizhong Ju, Ness Shroff
To address this issue, we formulate a general optimization problem for determining the optimal grouping strategy, which strikes a balance between performance loss and sample/computational complexity.
no code implementations • 14 Jun 2023 • Ming Shi, Yingbin Liang, Ness Shroff
However, existing theoretical results have shown that learning in POMDPs is intractable in the worst case, where the main challenge lies in the lack of latent state information.
no code implementations • 10 Mar 2023 • Honghao Wei, Arnob Ghosh, Ness Shroff, Lei Ying, Xingyu Zhou
We study model-free reinforcement learning (RL) algorithms in episodic non-stationary constrained Markov Decision Processes (CMDPs), in which an agent aims to maximize the expected cumulative reward subject to a cumulative constraint on the expected utility (cost).
no code implementations • 12 Feb 2023 • Sen Lin, Peizhong Ju, Yingbin Liang, Ness Shroff
In particular, there is a lack of understanding on what factors are important and how they affect "catastrophic forgetting" and generalization performance.
no code implementations • 8 Feb 2023 • Ming Shi, Yingbin Liang, Ness Shroff
In many applications of Reinforcement Learning (RL), it is critically important that the algorithm performs safely, such that instantaneous hard constraints are satisfied at each step, and unsafe states and actions are avoided.
no code implementations • 8 Feb 2023 • Ming Shi, Yingbin Liang, Ness Shroff
Our lower bound indicates that, due to the fundamental challenge of switching costs in adversarial RL, the best achieved regret (whose dependency on $T$ is $\tilde{O}(\sqrt{T})$) in static RL with switching costs (as well as adversarial RL without switching costs) is no longer achievable.
no code implementations • 23 Jun 2022 • Arnob Ghosh, Xingyu Zhou, Ness Shroff
To this end, we consider the episodic constrained Markov decision processes with linear function approximation, where the transition dynamics and the reward function can be represented as a linear function of some known feature mapping.
1 code implementation • NeurIPS 2021 • Wenbo Ren, Jia Liu, Ness Shroff
Here, a multi-wise comparison takes $m$ items as input and returns a (noisy) result about the best item (the winner feedback) or the order of these items (the full-ranking feedback).
no code implementations • 26 Aug 2021 • Sayak Ray Chowdhury, Xingyu Zhou, Ness Shroff
In this paper, we study the problem of regret minimization in reinforcement learning (RL) under differential privacy constraints.
no code implementations • 6 Jul 2021 • Yuntian Deng, Xingyu Zhou, Baekjin Kim, Ambuj Tewari, Abhishek Gupta, Ness Shroff
To this end, we develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression.
no code implementations • 11 Feb 2021 • Xingyu Zhou, Ness Shroff
In this paper, we consider the time-varying Bayesian optimization problem.
no code implementations • 28 Oct 2020 • Yuanjie Li, Esha Datta, Jiaxin Ding, Ness Shroff, Xin Liu
The demand for seamless Internet access under extreme user mobility, such as on high-speed trains and vehicles, has become a norm rather than an exception.
no code implementations • 13 Oct 2020 • Rahul Singh, Fang Liu, Yin Sun, Ness Shroff
We study a variant of the classical multi-armed bandit problem (MABP) which we call as Multi-Armed Bandits with dependent arms.
no code implementations • 6 Jun 2020 • Rahul Singh, Fang Liu, Xin Liu, Ness Shroff
We show that this asymptotically optimal regret is upper-bounded as $O\left(|\chi(\mathcal{G})|\log T\right)$, where $|\chi(\mathcal{G})|$ is the domination number of $\mathcal{G}$.
no code implementations • 16 May 2019 • Fang Liu, Ness Shroff
Then we study a form of online attacks on bandit algorithms and propose an adaptive attack strategy against any bandit algorithm without the knowledge of the bandit algorithm.
no code implementations • 28 Oct 2018 • Wenbo Ren, Jia Liu, Ness Shroff
Results in this paper provide up to $\rho n/k$ reductions compared with the "$k$-exploration" algorithms that focus on finding the (PAC) best $k$ arms out of $n$ arms.
no code implementations • 23 May 2018 • Fang Liu, Zizhan Zheng, Ness Shroff
To fill this gap, we propose a variant of Thompson Sampling, that attains the optimal regret in the directed setting within a logarithmic factor.
no code implementations • 16 Apr 2018 • Fang Liu, Sinong Wang, Swapna Buccapatnam, Ness Shroff
We show that UCBoost($D$) enjoys $O(1)$ complexity for each arm per round as well as regret guarantee that is $1/e$-close to that of the kl-UCB algorithm.
no code implementations • NeurIPS 2017 • Sinong Wang, Ness Shroff
It is well known that, for a linear program (LP) with constraint matrix $\mathbf{A}\in\mathbb{R}^{m\times n}$, the Alternating Direction Method of Multiplier converges globally and linearly at a rate $O((\|\mathbf{A}\|_F^2+mn)\log(1/\epsilon))$.
no code implementations • 8 Nov 2017 • Fang Liu, Swapna Buccapatnam, Ness Shroff
We consider stochastic multi-armed bandit problems with graph feedback, where the decision maker is allowed to observe the neighboring actions of the chosen action.
no code implementations • 8 Nov 2017 • Fang Liu, Joohyun Lee, Ness Shroff
The multi-armed bandit problem has been extensively studied under the stationary assumption.