Search Results for author: Gellért Weisz

Found 9 papers, 1 papers with code

Online RL in Linearly $q^π$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore

no code implementations11 Oct 2023 Gellért Weisz, András György, Csaba Szepesvári

We consider online reinforcement learning (RL) in episodic Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where it is assumed that the action-values of all policies can be expressed as linear functions of state-action features.

Reinforcement Learning (RL)

Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL

no code implementations NeurIPS 2023 Qinghua Liu, Gellért Weisz, András György, Chi Jin, Csaba Szepesvári

While policy optimization algorithms have played an important role in recent empirical success of Reinforcement Learning (RL), the existing theoretical understanding of policy optimization remains rather limited -- they are either restricted to tabular MDPs or suffer from highly suboptimal sample complexity, especial in online RL where exploration is necessary.

Reinforcement Learning (RL)

Exponential Hardness of Reinforcement Learning with Linear Function Approximation

no code implementations25 Feb 2023 Daniel Kane, Sihan Liu, Shachar Lovett, Gaurav Mahajan, Csaba Szepesvári, Gellért Weisz

The rewards in this game are chosen such that if the learner achieves large reward, then the learner's actions can be used to simulate solving a variant of 3-SAT, where (a) each variable shows up in a bounded number of clauses (b) if an instance has no solutions then it also has no solutions that satisfy more than (1-$\epsilon$)-fraction of clauses.

Learning Theory reinforcement-learning +1

Confident Approximate Policy Iteration for Efficient Local Planning in $q^π$-realizable MDPs

no code implementations27 Oct 2022 Gellért Weisz, András György, Tadashi Kozuno, Csaba Szepesvári

Our first contribution is a new variant of Approximate Policy Iteration (API), called Confident Approximate Policy Iteration (CAPI), which computes a deterministic stationary policy with an optimal error bound scaling linearly with the product of the effective horizon $H$ and the worst-case approximation error $\epsilon$ of the action-value functions of stationary policies.

TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions

no code implementations5 Oct 2021 Gellért Weisz, Csaba Szepesvári, András György

Furthermore, we show that the upper bound of TensorPlan can be extended to hold under (iii) and, for MDPs with deterministic transitions and stochastic rewards, also under (ii).

On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function

no code implementations3 Feb 2021 Gellért Weisz, Philip Amortila, Barnabás Janzer, Yasin Abbasi-Yadkori, Nan Jiang, Csaba Szepesvári

We consider local planning in fixed-horizon MDPs with a generative model under the assumption that the optimal value function lies close to the span of a feature map.

Open-Ended Question Answering

Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions

no code implementations3 Oct 2020 Gellért Weisz, Philip Amortila, Csaba Szepesvári

We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a feature map that is available to the planner.

LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration

1 code implementation ICML 2018 Gellért Weisz, András György, Csaba Szepesvári

We consider the problem of configuring general-purpose solvers to run efficiently on problem instances drawn from an unknown distribution.

Sample Efficient Deep Reinforcement Learning for Dialogue Systems with Large Action Spaces

no code implementations11 Feb 2018 Gellért Weisz, Paweł Budzianowski, Pei-Hao Su, Milica Gašić

A part of this effort is the policy optimisation task, which attempts to find a policy describing how to respond to humans, in the form of a function taking the current state of the dialogue and returning the response of the system.

reinforcement-learning Reinforcement Learning (RL) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.