Search Results for author: Liyu Chen

Found 16 papers, 1 papers with code

Synthesize Policies for Transfer and Adaptation across Tasks and Environments

no code implementations NeurIPS 2018 Hexiang Hu, Liyu Chen, Boqing Gong, Fei Sha

The ability to transfer in reinforcement learning is key towards building an agent of general artificial intelligence.

$\mathcal{B}$-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis

no code implementations4 Oct 2023 Zishun Yu, Yunzhe Tao, Liyu Chen, Tao Sun, Hongxia Yang

Despite policy-based RL methods dominating the literature on RL for program synthesis, the nature of program synthesis tasks hints at a natural alignment with value-based methods.

Code Generation Program Synthesis +2

Layered State Discovery for Incremental Autonomous Exploration

no code implementations7 Feb 2023 Liyu Chen, Andrea Tirinzoni, Alessandro Lazaric, Matteo Pirotta

We leverage these results to design Layered Autonomous Exploration (LAE), a novel algorithm for AX that attains a sample complexity of $\tilde{\mathcal{O}}(LS^{\rightarrow}_{L(1+\epsilon)}\Gamma_{L(1+\epsilon)} A \ln^{12}(S^{\rightarrow}_{L(1+\epsilon)})/\epsilon^2)$, where $S^{\rightarrow}_{L(1+\epsilon)}$ is the number of states that are incrementally $L(1+\epsilon)$-controllable, $A$ is the number of actions, and $\Gamma_{L(1+\epsilon)}$ is the branching factor of the transitions over such states.

Reaching Goals is Hard: Settling the Sample Complexity of the Stochastic Shortest Path

no code implementations10 Oct 2022 Liyu Chen, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric

We also initiate the study of learning $\epsilon$-optimal policies without access to a generative model (i. e., the so-called best-policy identification problem), and show that sample-efficient learning is impossible in general.

Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback

no code implementations26 May 2022 Yan Dai, Haipeng Luo, Liyu Chen

More importantly, we then find two significant applications: First, the analysis of FTPL turns out to be readily generalizable to delayed bandit feedback with order-optimal regret, while OMD methods exhibit extra difficulties (Jin et al., 2022).

Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments

no code implementations25 May 2022 Liyu Chen, Haipeng Luo

We initiate the study of dynamic regret minimization for goal-oriented reinforcement learning modeled by a non-stationary stochastic shortest path problem with changing cost and transition functions.

reinforcement-learning Reinforcement Learning (RL)

Policy Learning and Evaluation with Randomized Quasi-Monte Carlo

no code implementations16 Feb 2022 Sebastien M. R. Arnold, Pierre L'Ecuyer, Liyu Chen, Yi-fan Chen, Fei Sha

Reinforcement learning constantly deals with hard integrals, for example when computing expectations in policy evaluation and policy iteration.

Continuous Control Policy Gradient Methods +1

Policy Optimization for Stochastic Shortest Path

no code implementations7 Feb 2022 Liyu Chen, Haipeng Luo, Aviv Rosenberg

Policy optimization is among the most popular and successful reinforcement learning algorithms, and there is increasing interest in understanding its theoretical guarantees.

reinforcement-learning Reinforcement Learning (RL)

Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints

no code implementations31 Jan 2022 Liyu Chen, Rahul Jain, Haipeng Luo

We study regret minimization for infinite-horizon average-reward Markov Decision Processes (MDPs) under cost constraints.

Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path

no code implementations NeurIPS 2021 Liyu Chen, Mehdi Jafarnia-Jahromi, Rahul Jain, Haipeng Luo

We introduce a generic template for developing regret minimization algorithms in the Stochastic Shortest Path (SSP) model, which achieves minimax optimal regret as long as certain properties are ensured.

Online Learning for Stochastic Shortest Path Model via Posterior Sampling

no code implementations9 Jun 2021 Mehdi Jafarnia-Jahromi, Liyu Chen, Rahul Jain, Haipeng Luo

We consider the problem of online reinforcement learning for the Stochastic Shortest Path (SSP) problem modeled as an unknown MDP with an absorbing state.

reinforcement-learning Reinforcement Learning (RL)

Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case

no code implementations10 Feb 2021 Liyu Chen, Haipeng Luo

Our work strictly improves (Rosenberg and Mansour, 2020) in the full information setting, extends (Chen et al., 2020) from known transition to unknown transition, and is also the first to consider the most challenging combination: bandit feedback with adversarial costs and unknown transition.

Impossible Tuning Made Possible: A New Expert Algorithm and Its Applications

no code implementations1 Feb 2021 Liyu Chen, Haipeng Luo, Chen-Yu Wei

We resolve the long-standing "impossible tuning" issue for the classic expert problem and show that, it is in fact possible to achieve regret $O\left(\sqrt{(\ln d)\sum_t \ell_{t, i}^2}\right)$ simultaneously for all expert $i$ in a $T$-round $d$-expert problem where $\ell_{t, i}$ is the loss for expert $i$ in round $t$.

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

no code implementations7 Dec 2020 Liyu Chen, Haipeng Luo, Chen-Yu Wei

We study the stochastic shortest path problem with adversarial costs and known transition, and show that the minimax regret is $\widetilde{O}(\sqrt{DT^\star K})$ and $\widetilde{O}(\sqrt{DT^\star SA K})$ for the full-information setting and the bandit feedback setting respectively, where $D$ is the diameter, $T^\star$ is the expected hitting time of the optimal policy, $S$ is the number of states, $A$ is the number of actions, and $K$ is the number of episodes.

Synthesized Policies for Transfer and Adaptation across Tasks and Environments

2 code implementations NeurIPS 2018 Hexiang Hu, Liyu Chen, Boqing Gong, Fei Sha

The ability to transfer in reinforcement learning is key towards building an agent of general artificial intelligence.

Cannot find the paper you are looking for? You can Submit a new open access paper.