no code implementations • 15 Feb 2024 • Yihan Du, Anna Winnicki, Gal Dalal, Shie Mannor, R. Srikant
In PO-RLHF, knowledge of the reward function is not assumed and the algorithm relies on trajectory-based comparison feedback to infer the reward function.
no code implementations • 17 Mar 2023 • Anna Winnicki, R. Srikant
We further show that lookahead can be implemented efficiently in the function approximation setting of linear Markov games, which are the counterpart of the much-studied linear MDPs.
Model-based Reinforcement Learning Multi-agent Reinforcement Learning +2
no code implementations • 23 Jan 2023 • Anna Winnicki, R. Srikant
A common technique in reinforcement learning is to evaluate the value function from Monte Carlo simulations of a given policy, and use the estimated value function to obtain a new policy which is greedy with respect to the estimated value function.
no code implementations • 13 Oct 2022 • Anna Winnicki, R. Srikant
We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are useful for very large MDPs, including lookahead, function approximation, and gradient descent.
no code implementations • 28 Sep 2021 • Anna Winnicki, Joseph Lubars, Michael Livesay, R. Srikant
Therefore, techniques such as lookahead for policy improvement and m-step rollout for policy evaluation are used in practice to improve the performance of approximate dynamic programming with function approximation.
no code implementations • 29 Jan 2021 • Joseph Lubars, Anna Winnicki, Michael Livesay, R. Srikant
We consider Markov Decision Processes (MDPs) in which every stationary policy induces the same graph structure for the underlying Markov chain and further, the graph has the following property: if we replace each recurrent class by a node, then the resulting graph is acyclic.
no code implementations • 23 Oct 2019 • Mariola Ndrio, Anna Winnicki, Subhonmesh Bose
We analyze pricing mechanisms in electricity markets with AC power flow equations that define a nonconvex feasible set for the economic dispatch problem.