Search Results for author: Liyu Chen

Found 16 papers, 1 papers with code

Synthesize Policies for Transfer and Adaptation across Tasks and Environments

no code implementations • NeurIPS 2018 • Hexiang Hu, Liyu Chen, Boqing Gong, Fei Sha

The ability to transfer in reinforcement learning is key towards building an agent of general artificial intelligence.

Paper
Add Code

$\mathbf{(N,K)}$-Puzzle: A Cost-Efficient Testbed for Benchmarking Reinforcement Learning Algorithms in Generative Language Model

no code implementations • 11 Mar 2024 • Yufeng Zhang, Liyu Chen, Boyi Liu, Yingxiang Yang, Qiwen Cui, Yunzhe Tao, Hongxia Yang

Recent advances in reinforcement learning (RL) algorithms aim to enhance the performance of language models at scale.

Benchmarking Language Modelling +2

Paper
Add Code

$\mathcal{B}$-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis

no code implementations • 4 Oct 2023 • Zishun Yu, Yunzhe Tao, Liyu Chen, Tao Sun, Hongxia Yang

Despite policy-based RL methods dominating the literature on RL for program synthesis, the nature of program synthesis tasks hints at a natural alignment with value-based methods.

Code Generation Program Synthesis +2

Paper
Add Code

Layered State Discovery for Incremental Autonomous Exploration

no code implementations • 7 Feb 2023 • Liyu Chen, Andrea Tirinzoni, Alessandro Lazaric, Matteo Pirotta

We leverage these results to design Layered Autonomous Exploration (LAE), a novel algorithm for AX that attains a sample complexity of $\tilde{\mathcal{O}}(LS^{\rightarrow}_{L(1+\epsilon)}\Gamma_{L(1+\epsilon)} A \ln^{12}(S^{\rightarrow}_{L(1+\epsilon)})/\epsilon^2)$, where $S^{\rightarrow}_{L(1+\epsilon)}$ is the number of states that are incrementally $L(1+\epsilon)$-controllable, $A$ is the number of actions, and $\Gamma_{L(1+\epsilon)}$ is the branching factor of the transitions over such states.

Paper
Add Code

Reaching Goals is Hard: Settling the Sample Complexity of the Stochastic Shortest Path

no code implementations • 10 Oct 2022 • Liyu Chen, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric

We also initiate the study of learning $\epsilon$-optimal policies without access to a generative model (i. e., the so-called best-policy identification problem), and show that sample-efficient learning is impossible in general.

Paper
Add Code

Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback

no code implementations • 26 May 2022 • Yan Dai, Haipeng Luo, Liyu Chen

More importantly, we then find two significant applications: First, the analysis of FTPL turns out to be readily generalizable to delayed bandit feedback with order-optimal regret, while OMD methods exhibit extra difficulties (Jin et al., 2022).

Paper
Add Code

Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments

no code implementations • 25 May 2022 • Liyu Chen, Haipeng Luo

We initiate the study of dynamic regret minimization for goal-oriented reinforcement learning modeled by a non-stationary stochastic shortest path problem with changing cost and transition functions.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Policy Learning and Evaluation with Randomized Quasi-Monte Carlo

no code implementations • 16 Feb 2022 • Sebastien M. R. Arnold, Pierre L'Ecuyer, Liyu Chen, Yi-fan Chen, Fei Sha

Reinforcement learning constantly deals with hard integrals, for example when computing expectations in policy evaluation and policy iteration.

Continuous Control Policy Gradient Methods +1

Paper
Add Code

Policy Optimization for Stochastic Shortest Path

no code implementations • 7 Feb 2022 • Liyu Chen, Haipeng Luo, Aviv Rosenberg

Policy optimization is among the most popular and successful reinforcement learning algorithms, and there is increasing interest in understanding its theoretical guarantees.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints

no code implementations • 31 Jan 2022 • Liyu Chen, Rahul Jain, Haipeng Luo

We study regret minimization for infinite-horizon average-reward Markov Decision Processes (MDPs) under cost constraints.

Paper
Add Code

Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path

no code implementations • NeurIPS 2021 • Liyu Chen, Mehdi Jafarnia-Jahromi, Rahul Jain, Haipeng Luo

We introduce a generic template for developing regret minimization algorithms in the Stochastic Shortest Path (SSP) model, which achieves minimax optimal regret as long as certain properties are ensured.

Paper
Add Code

Online Learning for Stochastic Shortest Path Model via Posterior Sampling

no code implementations • 9 Jun 2021 • Mehdi Jafarnia-Jahromi, Liyu Chen, Rahul Jain, Haipeng Luo

We consider the problem of online reinforcement learning for the Stochastic Shortest Path (SSP) problem modeled as an unknown MDP with an absorbing state.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case

no code implementations • 10 Feb 2021 • Liyu Chen, Haipeng Luo

Our work strictly improves (Rosenberg and Mansour, 2020) in the full information setting, extends (Chen et al., 2020) from known transition to unknown transition, and is also the first to consider the most challenging combination: bandit feedback with adversarial costs and unknown transition.

Paper
Add Code

Impossible Tuning Made Possible: A New Expert Algorithm and Its Applications

no code implementations • 1 Feb 2021 • Liyu Chen, Haipeng Luo, Chen-Yu Wei

We resolve the long-standing "impossible tuning" issue for the classic expert problem and show that, it is in fact possible to achieve regret $O\left(\sqrt{(\ln d)\sum_t \ell_{t, i}^2}\right)$ simultaneously for all expert $i$ in a $T$-round $d$-expert problem where $\ell_{t, i}$ is the loss for expert $i$ in round $t$.

Paper
Add Code

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

no code implementations • 7 Dec 2020 • Liyu Chen, Haipeng Luo, Chen-Yu Wei

We study the stochastic shortest path problem with adversarial costs and known transition, and show that the minimax regret is $\widetilde{O}(\sqrt{DT^\star K})$ and $\widetilde{O}(\sqrt{DT^\star SA K})$ for the full-information setting and the bandit feedback setting respectively, where $D$ is the diameter, $T^\star$ is the expected hitting time of the optimal policy, $S$ is the number of states, $A$ is the number of actions, and $K$ is the number of episodes.

Paper
Add Code

Synthesized Policies for Transfer and Adaptation across Tasks and Environments

2 code implementations • NeurIPS 2018 • Hexiang Hu, Liyu Chen, Boqing Gong, Fei Sha

The ability to transfer in reinforcement learning is key towards building an agent of general artificial intelligence.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.