2 code implementations • NeurIPS 2023 • Elita A. Lobo, Cyrus Cousins, Yair Zick, Marek Petrik
The percentile criterion is approximately solved by constructing an \emph{ambiguity set} that contains the true model with high probability and optimizing the policy for the worst model in the set.
no code implementations • 6 Apr 2024 • Elita Lobo, Harvineet Singh, Marek Petrik, Cynthia Rudin, Himabindu Lakkaraju
Off-policy Evaluation (OPE) methods are a crucial tool for evaluating policies in high-stakes domains such as healthcare, where exploration is often infeasible, unethical, or expensive.
no code implementations • 2 Jun 2023 • Mohammad Ghavamzadeh, Marek Petrik, Guy Tennenholtz
We study how to make decisions that minimize Bayesian regret in offline linear bandits.
no code implementations • NeurIPS 2023 • Jia Lin Hau, Erick Delage, Mohammad Ghavamzadeh, Marek Petrik
However, we show that these popular decompositions for Conditional-Value-at-Risk (CVaR) and Entropic-Value-at-Risk (EVaR) are inherently suboptimal regardless of the discretization level.
1 code implementation • 20 Dec 2022 • Qiuhao Wang, Chin Pang Ho, Marek Petrik
In contrast with prior robust policy gradient algorithms, DRPG monotonically reduces approximation errors to guarantee convergence to a globally optimal policy in tabular RMDPs.
no code implementations • 21 Sep 2022 • Julien Grand-Clément, Marek Petrik
Our work opens a new research direction for RMDPs and can serve as a first step toward obtaining a tractable convex formulation of RMDPs.
no code implementations • 9 Sep 2022 • Jia Lin Hau, Marek Petrik, Mohammad Ghavamzadeh, Reazul Russel
We propose and analyze a new framework to jointly model the risk associated with epistemic and aleatory uncertainties in finite-horizon and discounted infinite-horizon MDPs.
no code implementations • 27 May 2022 • Chin Pang Ho, Marek Petrik, Wolfram Wiesemann
In recent years, robust Markov decision processes (MDPs) have emerged as a prominent modeling framework for dynamic decision problems affected by uncertainty.
no code implementations • NeurIPS 2021 • Bahram Behzadian, Marek Petrik, Chin Pang Ho
Robust Markov decision processes (RMDPs) are a useful building block of robust reinforcement learning algorithms but can be hard to solve.
no code implementations • 11 Jun 2021 • Zaynah Javed, Daniel S. Brown, Satvik Sharma, Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca D. Dragan, Ken Goldberg
Results suggest that PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse and outperforms state-of-the-art imitation learning algorithms when learning from ambiguous demonstrations by hedging against uncertainty, rather than seeking to uniquely identify the demonstrator's reward function.
no code implementations • 4 Jan 2021 • Mostafa Hussein, Brendan Crowe, Marek Petrik, Momotaz Begum
Imitation learning (IL) algorithms use expert demonstrations to learn a specific task.
no code implementations • 30 Nov 2020 • Elita A. Lobo, Mohammad Ghavamzadeh, Marek Petrik
In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the percentile criterion, which minimizes the probability of a catastrophic failure.
1 code implementation • NeurIPS 2020 • Daniel S. Brown, Scott Niekum, Marek Petrik
Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework that optimizes a policy under the assumption of an adversarial reward function, whereas risk-neutral IRL approaches either optimize a policy for the mean or MAP reward function.
no code implementations • 20 Jun 2020 • Reazul Hasan Russel, Bahram Behzadian, Marek Petrik
Having a perfect model to compute the optimal policy is often infeasible in reinforcement learning.
1 code implementation • 16 Jun 2020 • Chin Pang Ho, Marek Petrik, Wolfram Wiesemann
Robust Markov decision processes (MDPs) allow to compute reliable solutions for dynamic decision problems whose evolution is modeled by rewards and partially-known transition probabilities.
no code implementations • 6 Jun 2020 • Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik
In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms.
1 code implementation • 6 Jun 2020 • Bo Liu, Ian Gemp, Mohammad Ghavamzadeh, Ji Liu, Sridhar Mahadevan, Marek Petrik
In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms.
no code implementations • 4 Dec 2019 • Reazul Hasan Russel, Bahram Behzadian, Marek Petrik
Our proposed method computes a weight parameter from the value functions, and these weights then drive the shape of the ambiguity sets.
no code implementations • 23 Oct 2019 • Bahram Behzadian, Reazul Hasan Russel, Marek Petrik, Chin Pang Ho
We then propose new algorithms that minimize the span of ambiguity sets defined by weighted $L_1$ and $L_\infty$ norms.
no code implementations • 17 Apr 2019 • Reazul H. Russel, Tianyi Gu, Marek Petrik
Optimism about the poorly understood states and actions is the main driving force of exploration for many provably-efficient reinforcement learning algorithms.
1 code implementation • NeurIPS 2019 • Marek Petrik, Reazul Hasan Russell
Robust MDPs (RMDPs) can be used to compute policies with provable worst-case guarantees in reinforcement learning.
no code implementations • NeurIPS 2018 • Andrea Tirinzoni, Marek Petrik, Xiangli Chen, Brian Ziebart
What policy should be employed in a Markov decision process with uncertain parameters?
no code implementations • 15 Nov 2018 • Reazul Hasan Russel, Marek Petrik
Robustness is important for sequential decision making in a stochastic dynamic environment with uncertain probabilistic parameters.
no code implementations • 19 Sep 2018 • Alexander Brown, Marek Petrik
We propose to use boosted regression trees as a way to compute human-interpretable solutions to reinforcement learning problems.
BIG-bench Machine Learning Interpretable Machine Learning +3
no code implementations • ICML 2018 • Chin Pang Ho, Marek Petrik, Wolfram Wiesemann
The first algorithm uses a homotopy continuation method to compute updates for L1-constrained s, a-rectangular ambiguity sets.
no code implementations • 14 Jun 2017 • Adam N. Elmachtoub, Ryan McNellis, Sechan Oh, Marek Petrik
We propose a new method for the contextual bandit problem that is simple, practical, and can be applied with little or no domain expertise.
no code implementations • 12 Apr 2017 • Bence Cserna, Marek Petrik, Reazul Hasan Russel, Wheeler Ruml
Multi-armed bandits are a quintessential machine learning problem requiring the balancing of exploration and exploitation.
no code implementations • NeurIPS 2016 • Marek Petrik, Yin-Lam Chow, Mohammad Ghavamzadeh
We show that our formulation is NP-hard and propose an approximate algorithm.
no code implementations • 19 Jun 2016 • Amit Dhurandhar, Sechan Oh, Marek Petrik
We propose a method for building an interpretable recommender system for personalizing online content and promotions.
no code implementations • 16 Oct 2015 • Stephen Becker, Ban Kawas, Marek Petrik, Karthikeyan N. Ramamurthy
While maintaining computational efficiency, our models provide robust solutions that are more accurate--relative to solutions of uncompressed least-squares--than those of classical compressed variants.
no code implementations • NeurIPS 2014 • Marek Petrik, Dharmashankar Subramanian
We describe how to use robust Markov decision processes for value function approximation with state aggregation.
no code implementations • 15 Jan 2014 • Marek Petrik, Shlomo Zilberstein
Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement.
no code implementations • 26 Sep 2013 • Marek Petrik, Dharmashankar Subramanian, Janusz Marecki
We propose solution methods for previously-unsolved constrained MDPs in which actions can continuously modify the transition probabilities within some acceptable sets.
no code implementations • NeurIPS 2009 • Marek Petrik, Shlomo Zilberstein
Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds.
no code implementations • NeurIPS 2008 • Marek Petrik, Bruno Scherrer
We thus propose another justification: when the rewards are received only sporadically (as it is the case in Tetris), we can derive tighter bounds, which support a significant performance increase with a decrease in the discount factor.