no code implementations • ICML 2020 • Chi Jin, Tiancheng Jin, Haipeng Luo, Suvrit Sra, Tiancheng Yu
We consider the task of learning in episodic finite-horizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses.
no code implementations • 19 Jun 2022 • Mingyang Liu, Asuman Ozdaglar, Tiancheng Yu, Kaiqing Zhang
Second, we show that regularized counterfactual regret minimization (\texttt{Reg-CFR}), with a variant of optimistic mirror descent algorithm as regret-minimizer, can achieve $O(1/T^{1/4})$ best-iterate, and $O(1/T^{3/4})$ average-iterate convergence rate for finding NE in EFGs.
no code implementations • 30 May 2022 • Yu Bai, Chi Jin, Song Mei, Ziang Song, Tiancheng Yu
A conceptually appealing approach for learning Extensive-Form Games (EFGs) is to convert them to Normal-Form Games (NFGs).
no code implementations • 3 Feb 2022 • Yu Bai, Chi Jin, Song Mei, Tiancheng Yu
This improves upon the best known sample complexity of $\widetilde{\mathcal{O}}((X^2A+Y^2B)/\varepsilon^2)$ by a factor of $\widetilde{\mathcal{O}}(\max\{X, Y\})$, and matches the information-theoretic lower bound up to logarithmic factors.
no code implementations • 27 Oct 2021 • Chi Jin, Qinghua Liu, Yuanhao Wang, Tiancheng Yu
We design a new class of fully decentralized algorithms -- V-learning, which provably learns Nash equilibria (in the two-player zero-sum setting), correlated equilibria and coarse correlated equilibria (in the multiplayer general-sum setting) in a number of samples that only scales with $\max_{i\in[m]} A_i$, where $A_i$ is the number of actions for the $i^{\rm th}$ player.
no code implementations • 7 Jun 2021 • Chi Jin, Qinghua Liu, Tiancheng Yu
Modern reinforcement learning (RL) commonly engages practical problems with large state spaces, where function approximation must be deployed to approximate either the value function or the policy.
no code implementations • 5 Feb 2021 • Tiancheng Yu, Yi Tian, Jingzhao Zhang, Suvrit Sra
To our knowledge, this work provides the first provably efficient algorithms for vector-valued Markov games and our theoretical guarantees are near-optimal.
no code implementations • 28 Oct 2020 • Yi Tian, Yuanhao Wang, Tiancheng Yu, Suvrit Sra
We study online learning in unknown Markov games, a problem that arises in episodic multi-agent reinforcement learning where the actions of the opponents are unobservable.
no code implementations • 4 Oct 2020 • Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin
However, for multi-agent reinforcement learning in Markov games, the current best known sample complexity for model-based algorithms is rather suboptimal and compares unfavorably against recent model-free approaches.
Model-based Reinforcement Learning Multi-agent Reinforcement Learning +2
no code implementations • NeurIPS 2020 • Yu Bai, Chi Jin, Tiancheng Yu
This paper considers the problem of designing optimal algorithms for reinforcement learning in two-player zero-sum games.
no code implementations • 11 Jun 2020 • Chi-Ning Chou, Juspreet Singh Sandhu, Mien Brabeeba Wang, Tiancheng Yu
In this work, we present a streamlined three-step recipe to tackle the "chicken and egg" problem and give a general framework for analyzing stochastic dynamics in learning algorithms.
no code implementations • ICML 2020 • Chi Jin, Akshay Krishnamurthy, Max Simchowitz, Tiancheng Yu
We give an efficient algorithm that conducts $\tilde{\mathcal{O}}(S^2A\mathrm{poly}(H)/\epsilon^2)$ episodes of exploration and returns $\epsilon$-suboptimal policies for an arbitrary number of reward functions.
no code implementations • 3 Dec 2019 • Chi Jin, Tiancheng Jin, Haipeng Luo, Suvrit Sra, Tiancheng Yu
We consider the problem of learning in episodic finite-horizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses.
no code implementations • 22 Jul 2019 • Tiancheng Yu, Suvrit Sra
A Markov Decision Process (MDP) is a popular model for reinforcement learning.
no code implementations • 26 Jun 2019 • Tiancheng Yu, Xiyu Zhai, Suvrit Sra
The performance of a machine learning system is usually evaluated by using i. i. d.\ observations with true labels.
no code implementations • NeurIPS 2018 • Yanjun Han, Jiantao Jiao, Chuan-Zheng Lee, Tsachy Weissman, Yihong Wu, Tiancheng Yu
For estimating the Shannon entropy of a distribution on $S$ elements with independent samples, [Paninski2004] showed that the sample complexity is sublinear in $S$, and [Valiant--Valiant2011] showed that consistent estimation of Shannon entropy is possible if and only if the sample size $n$ far exceeds $\frac{S}{\log S}$.