Search Results for author: Marek Petrik

Found 35 papers, 6 papers with code

Percentile Criterion Optimization in Offline Reinforcement Learning

2 code implementations NeurIPS 2023 Elita A. Lobo, Cyrus Cousins, Yair Zick, Marek Petrik

The percentile criterion is approximately solved by constructing an \emph{ambiguity set} that contains the true model with high probability and optimizing the policy for the worst model in the set.

Decision Making reinforcement-learning

Data Poisoning Attacks on Off-Policy Policy Evaluation Methods

no code implementations6 Apr 2024 Elita Lobo, Harvineet Singh, Marek Petrik, Cynthia Rudin, Himabindu Lakkaraju

Off-policy Evaluation (OPE) methods are a crucial tool for evaluating policies in high-stakes domains such as healthcare, where exploration is often infeasible, unethical, or expensive.

Data Poisoning Off-policy evaluation

Bayesian Regret Minimization in Offline Bandits

no code implementations2 Jun 2023 Mohammad Ghavamzadeh, Marek Petrik, Guy Tennenholtz

We study how to make decisions that minimize Bayesian regret in offline linear bandits.

On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes

no code implementations NeurIPS 2023 Jia Lin Hau, Erick Delage, Mohammad Ghavamzadeh, Marek Petrik

However, we show that these popular decompositions for Conditional-Value-at-Risk (CVaR) and Entropic-Value-at-Risk (EVaR) are inherently suboptimal regardless of the discretization level.

Reinforcement Learning (RL)

Policy Gradient in Robust MDPs with Global Convergence Guarantee

1 code implementation20 Dec 2022 Qiuhao Wang, Chin Pang Ho, Marek Petrik

In contrast with prior robust policy gradient algorithms, DRPG monotonically reduces approximation errors to guarantee convergence to a globally optimal policy in tabular RMDPs.

Policy Gradient Methods

On the convex formulations of robust Markov decision processes

no code implementations21 Sep 2022 Julien Grand-Clément, Marek Petrik

Our work opens a new research direction for RMDPs and can serve as a first step toward obtaining a tractable convex formulation of RMDPs.

RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

no code implementations9 Sep 2022 Jia Lin Hau, Marek Petrik, Mohammad Ghavamzadeh, Reazul Russel

We propose and analyze a new framework to jointly model the risk associated with epistemic and aleatory uncertainties in finite-horizon and discounted infinite-horizon MDPs.

Reinforcement Learning (RL) Safe Reinforcement Learning

Robust Phi-Divergence MDPs

no code implementations27 May 2022 Chin Pang Ho, Marek Petrik, Wolfram Wiesemann

In recent years, robust Markov decision processes (MDPs) have emerged as a prominent modeling framework for dynamic decision problems affected by uncertainty.

Fast Algorithms for $L_\infty$-constrained S-rectangular Robust MDPs

no code implementations NeurIPS 2021 Bahram Behzadian, Marek Petrik, Chin Pang Ho

Robust Markov decision processes (RMDPs) are a useful building block of robust reinforcement learning algorithms but can be hard to solve.

reinforcement-learning Reinforcement Learning (RL)

Policy Gradient Bayesian Robust Optimization for Imitation Learning

no code implementations11 Jun 2021 Zaynah Javed, Daniel S. Brown, Satvik Sharma, Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca D. Dragan, Ken Goldberg

Results suggest that PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse and outperforms state-of-the-art imitation learning algorithms when learning from ambiguous demonstrations by hedging against uncertainty, rather than seeking to uniquely identify the demonstrator's reward function.

Imitation Learning

Robust Maximum Entropy Behavior Cloning

no code implementations4 Jan 2021 Mostafa Hussein, Brendan Crowe, Marek Petrik, Momotaz Begum

Imitation learning (IL) algorithms use expert demonstrations to learn a specific task.

Imitation Learning

Soft-Robust Algorithms for Batch Reinforcement Learning

no code implementations30 Nov 2020 Elita A. Lobo, Mohammad Ghavamzadeh, Marek Petrik

In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the percentile criterion, which minimizes the probability of a catastrophic failure.

Decision Making reinforcement-learning +1

Bayesian Robust Optimization for Imitation Learning

1 code implementation NeurIPS 2020 Daniel S. Brown, Scott Niekum, Marek Petrik

Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework that optimizes a policy under the assumption of an adversarial reward function, whereas risk-neutral IRL approaches either optimize a policy for the mean or MAP reward function.

Imitation Learning reinforcement-learning +1

Partial Policy Iteration for L1-Robust Markov Decision Processes

1 code implementation16 Jun 2020 Chin Pang Ho, Marek Petrik, Wolfram Wiesemann

Robust Markov decision processes (MDPs) allow to compute reliable solutions for dynamic decision problems whose evolution is modeled by rewards and partially-known transition probabilities.

Finite-Sample Analysis of Proximal Gradient TD Algorithms

no code implementations6 Jun 2020 Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik

In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms.

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

1 code implementation6 Jun 2020 Bo Liu, Ian Gemp, Mohammad Ghavamzadeh, Ji Liu, Sridhar Mahadevan, Marek Petrik

In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms.

reinforcement-learning Reinforcement Learning (RL)

Optimizing Norm-Bounded Weighted Ambiguity Sets for Robust MDPs

no code implementations4 Dec 2019 Reazul Hasan Russel, Bahram Behzadian, Marek Petrik

Our proposed method computes a weight parameter from the value functions, and these weights then drive the shape of the ambiguity sets.

Optimizing Percentile Criterion Using Robust MDPs

no code implementations23 Oct 2019 Bahram Behzadian, Reazul Hasan Russel, Marek Petrik, Chin Pang Ho

We then propose new algorithms that minimize the span of ambiguity sets defined by weighted $L_1$ and $L_\infty$ norms.

Reinforcement Learning (RL)

Robust Exploration with Tight Bayesian Plausibility Sets

no code implementations17 Apr 2019 Reazul H. Russel, Tianyi Gu, Marek Petrik

Optimism about the poorly understood states and actions is the main driving force of exploration for many provably-efficient reinforcement learning algorithms.

Tight Bayesian Ambiguity Sets for Robust MDPs

no code implementations15 Nov 2018 Reazul Hasan Russel, Marek Petrik

Robustness is important for sequential decision making in a stochastic dynamic environment with uncertain probabilistic parameters.

Decision Making Reinforcement Learning (RL)

Interpretable Reinforcement Learning with Ensemble Methods

no code implementations19 Sep 2018 Alexander Brown, Marek Petrik

We propose to use boosted regression trees as a way to compute human-interpretable solutions to reinforcement learning problems.

BIG-bench Machine Learning Interpretable Machine Learning +3

Fast Bellman Updates for Robust MDPs

no code implementations ICML 2018 Chin Pang Ho, Marek Petrik, Wolfram Wiesemann

The first algorithm uses a homotopy continuation method to compute updates for L1-constrained s, a-rectangular ambiguity sets.

A Practical Method for Solving Contextual Bandit Problems Using Decision Trees

no code implementations14 Jun 2017 Adam N. Elmachtoub, Ryan McNellis, Sechan Oh, Marek Petrik

We propose a new method for the contextual bandit problem that is simple, practical, and can be applied with little or no domain expertise.

Thompson Sampling

Value Directed Exploration in Multi-Armed Bandits with Structured Priors

no code implementations12 Apr 2017 Bence Cserna, Marek Petrik, Reazul Hasan Russel, Wheeler Ruml

Multi-armed bandits are a quintessential machine learning problem requiring the balancing of exploration and exploitation.

Multi-Armed Bandits

Building an Interpretable Recommender via Loss-Preserving Transformation

no code implementations19 Jun 2016 Amit Dhurandhar, Sechan Oh, Marek Petrik

We propose a method for building an interpretable recommender system for personalizing online content and promotions.

Classification General Classification +2

Robust Partially-Compressed Least-Squares

no code implementations16 Oct 2015 Stephen Becker, Ban Kawas, Marek Petrik, Karthikeyan N. Ramamurthy

While maintaining computational efficiency, our models provide robust solutions that are more accurate--relative to solutions of uncompressed least-squares--than those of classical compressed variants.

Computational Efficiency

A Bilinear Programming Approach for Multiagent Planning

no code implementations15 Jan 2014 Marek Petrik, Shlomo Zilberstein

Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement.

Dimensionality Reduction

Solution Methods for Constrained Markov Decision Process with Continuous Probability Modulation

no code implementations26 Sep 2013 Marek Petrik, Dharmashankar Subramanian, Janusz Marecki

We propose solution methods for previously-unsolved constrained MDPs in which actions can continuously modify the transition probabilities within some acceptable sets.

Robust Value Function Approximation Using Bilinear Programming

no code implementations NeurIPS 2009 Marek Petrik, Shlomo Zilberstein

Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds.

Biasing Approximate Dynamic Programming with a Lower Discount Factor

no code implementations NeurIPS 2008 Marek Petrik, Bruno Scherrer

We thus propose another justification: when the rewards are received only sporadically (as it is the case in Tetris), we can derive tighter bounds, which support a significant performance increase with a decrease in the discount factor.

Cannot find the paper you are looking for? You can Submit a new open access paper.