no code implementations • 11 Feb 2024 • Yan Dai, Qiwen Cui, Simon S. Du
Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL).
no code implementations • 2 Feb 2024 • Kwangjun Ahn, ZhiYu Zhang, Yunbum Kook, Yan Dai
In this work, we provide a different perspective based on online learning that underscores the importance of Adam's algorithmic components.
no code implementations • 30 Jan 2023 • Yan Dai, Haipeng Luo, Chen-Yu Wei, Julian Zimmert
This analysis allows the loss estimators to be arbitrarily negative and might be of independent interest.
no code implementations • 25 Jan 2023 • Jiatai Huang, Yan Dai, Longbo Huang
\texttt{Banker-OMD} leads to the first delayed scale-free adversarial MAB algorithm achieving $\widetilde{\mathcal O}(\sqrt{K}L(\sqrt T+\sqrt D))$ regret and the first delayed adversarial linear bandit algorithm achieving $\widetilde{\mathcal O}(\text{poly}(n)(\sqrt{T} + \sqrt{D}))$ regret.
1 code implementation • 30 Jun 2022 • Xuanhan Wang, Yan Dai, Lianli Gao, Jingkuan Song
Specifically, each GCN model in ACFL not only learns action representation from the single-form skeletons, but also adaptively mimics useful representations derived from other forms of skeletons.
no code implementations • 26 May 2022 • Yan Dai, Ruosong Wang, Simon S. Du
On the other hand, in the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve $\widetilde{\mathcal O}(1)$ regret, which is (nearly) independent of $d$ and $T$.
no code implementations • 26 May 2022 • Yan Dai, Haipeng Luo, Liyu Chen
More importantly, we then find two significant applications: First, the analysis of FTPL turns out to be readily generalizable to delayed bandit feedback with order-optimal regret, while OMD methods exhibit extra difficulties (Jin et al., 2022).
no code implementations • 28 Jan 2022 • Jiatai Huang, Yan Dai, Longbo Huang
Specifically, we design an algorithm \texttt{HTINF}, when the heavy-tail parameters $\alpha$ and $\sigma$ are known to the agent, \texttt{HTINF} simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori.
no code implementations • 26 Oct 2021 • Jiatai Huang, Yan Dai, Longbo Huang
We consider the Scale-Free Adversarial Multi-Armed Bandit (MAB) problem with unrestricted feedback delays.
no code implementations • 4 Mar 2019 • Ruixue Liu, Baoyang Chen, XIAOYU GUO, Yan Dai, Meng Chen, Zhijie Qiu, Xiaodong He
Imagination is one of the most important factors which makes an artistic painting unique and impressive.
no code implementations • 24 May 2018 • Yi Yang, Andy Chen, Xiaoming Chen, Jiang Ji, Zhenyang Chen, Yan Dai
Implementing large-scale deep neural networks with high computational complexity on low-cost IoT devices may inevitably be constrained by limited computation resource, making the devices hard to respond in real-time.