Search Results for author: Marek Petrik

Found 35 papers, 6 papers with code

Percentile Criterion Optimization in Offline Reinforcement Learning

2 code implementations • NeurIPS 2023 • Elita A. Lobo, Cyrus Cousins, Yair Zick, Marek Petrik

The percentile criterion is approximately solved by constructing an \emph{ambiguity set} that contains the true model with high probability and optimizing the policy for the worst model in the set.

Decision Making reinforcement-learning

186

Paper
Code

Data Poisoning Attacks on Off-Policy Policy Evaluation Methods

no code implementations • 6 Apr 2024 • Elita Lobo, Harvineet Singh, Marek Petrik, Cynthia Rudin, Himabindu Lakkaraju

Off-policy Evaluation (OPE) methods are a crucial tool for evaluating policies in high-stakes domains such as healthcare, where exploration is often infeasible, unethical, or expensive.

Data Poisoning Off-policy evaluation

Paper
Add Code

Bayesian Regret Minimization in Offline Bandits

no code implementations • 2 Jun 2023 • Mohammad Ghavamzadeh, Marek Petrik, Guy Tennenholtz

We study how to make decisions that minimize Bayesian regret in offline linear bandits.

Paper
Add Code

On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes

no code implementations • NeurIPS 2023 • Jia Lin Hau, Erick Delage, Mohammad Ghavamzadeh, Marek Petrik

However, we show that these popular decompositions for Conditional-Value-at-Risk (CVaR) and Entropic-Value-at-Risk (EVaR) are inherently suboptimal regardless of the discretization level.

Reinforcement Learning (RL)

Paper
Add Code

Policy Gradient in Robust MDPs with Global Convergence Guarantee

1 code implementation • 20 Dec 2022 • Qiuhao Wang, Chin Pang Ho, Marek Petrik

In contrast with prior robust policy gradient algorithms, DRPG monotonically reduces approximation errors to guarantee convergence to a globally optimal policy in tabular RMDPs.

Policy Gradient Methods

Paper
Code

On the convex formulations of robust Markov decision processes

no code implementations • 21 Sep 2022 • Julien Grand-Clément, Marek Petrik

Our work opens a new research direction for RMDPs and can serve as a first step toward obtaining a tractable convex formulation of RMDPs.

Paper
Add Code

RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk

no code implementations • 9 Sep 2022 • Jia Lin Hau, Marek Petrik, Mohammad Ghavamzadeh, Reazul Russel

We propose and analyze a new framework to jointly model the risk associated with epistemic and aleatory uncertainties in finite-horizon and discounted infinite-horizon MDPs.

Reinforcement Learning (RL) Safe Reinforcement Learning

Paper
Add Code

Robust Phi-Divergence MDPs

no code implementations • 27 May 2022 • Chin Pang Ho, Marek Petrik, Wolfram Wiesemann

In recent years, robust Markov decision processes (MDPs) have emerged as a prominent modeling framework for dynamic decision problems affected by uncertainty.

Paper
Add Code

Fast Algorithms for $L_\infty$-constrained S-rectangular Robust MDPs

no code implementations • NeurIPS 2021 • Bahram Behzadian, Marek Petrik, Chin Pang Ho

Robust Markov decision processes (RMDPs) are a useful building block of robust reinforcement learning algorithms but can be hard to solve.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Policy Gradient Bayesian Robust Optimization for Imitation Learning

no code implementations • 11 Jun 2021 • Zaynah Javed, Daniel S. Brown, Satvik Sharma, Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca D. Dragan, Ken Goldberg

Results suggest that PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse and outperforms state-of-the-art imitation learning algorithms when learning from ambiguous demonstrations by hedging against uncertainty, rather than seeking to uniquely identify the demonstrator's reward function.

Imitation Learning

Paper
Add Code

Robust Maximum Entropy Behavior Cloning

no code implementations • 4 Jan 2021 • Mostafa Hussein, Brendan Crowe, Marek Petrik, Momotaz Begum

Imitation learning (IL) algorithms use expert demonstrations to learn a specific task.

Imitation Learning

Paper
Add Code

Soft-Robust Algorithms for Batch Reinforcement Learning

no code implementations • 30 Nov 2020 • Elita A. Lobo, Mohammad Ghavamzadeh, Marek Petrik

In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the percentile criterion, which minimizes the probability of a catastrophic failure.

Decision Making reinforcement-learning +1

Paper
Add Code

Bayesian Robust Optimization for Imitation Learning

1 code implementation • NeurIPS 2020 • Daniel S. Brown, Scott Niekum, Marek Petrik

Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework that optimizes a policy under the assumption of an adversarial reward function, whereas risk-neutral IRL approaches either optimize a policy for the mean or MAP reward function.

Imitation Learning reinforcement-learning +1

Paper
Code

Entropic Risk Constrained Soft-Robust Policy Optimization

no code implementations • 20 Jun 2020 • Reazul Hasan Russel, Bahram Behzadian, Marek Petrik

Having a perfect model to compute the optimal policy is often infeasible in reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Partial Policy Iteration for L1-Robust Markov Decision Processes

1 code implementation • 16 Jun 2020 • Chin Pang Ho, Marek Petrik, Wolfram Wiesemann

Robust Markov decision processes (MDPs) allow to compute reliable solutions for dynamic decision problems whose evolution is modeled by rewards and partially-known transition probabilities.

Paper
Code

Finite-Sample Analysis of Proximal Gradient TD Algorithms

no code implementations • 6 Jun 2020 • Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar Mahadevan, Marek Petrik

In this paper, we analyze the convergence rate of the gradient temporal difference learning (GTD) family of algorithms.

Paper
Add Code

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

1 code implementation • 6 Jun 2020 • Bo Liu, Ian Gemp, Mohammad Ghavamzadeh, Ji Liu, Sridhar Mahadevan, Marek Petrik

In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Optimizing Norm-Bounded Weighted Ambiguity Sets for Robust MDPs

no code implementations • 4 Dec 2019 • Reazul Hasan Russel, Bahram Behzadian, Marek Petrik

Our proposed method computes a weight parameter from the value functions, and these weights then drive the shape of the ambiguity sets.

Paper
Add Code

Optimizing Percentile Criterion Using Robust MDPs

no code implementations • 23 Oct 2019 • Bahram Behzadian, Reazul Hasan Russel, Marek Petrik, Chin Pang Ho

We then propose new algorithms that minimize the span of ambiguity sets defined by weighted $L_1$ and $L_\infty$ norms.

Reinforcement Learning (RL)

Paper
Add Code

Robust Exploration with Tight Bayesian Plausibility Sets

no code implementations • 17 Apr 2019 • Reazul H. Russel, Tianyi Gu, Marek Petrik

Optimism about the poorly understood states and actions is the main driving force of exploration for many provably-efficient reinforcement learning algorithms.

Paper
Add Code

Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs

1 code implementation • NeurIPS 2019 • Marek Petrik, Reazul Hasan Russell

Robust MDPs (RMDPs) can be used to compute policies with provable worst-case guarantees in reinforcement learning.

Bayesian Inference Position +2

Paper
Code

Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes

no code implementations • NeurIPS 2018 • Andrea Tirinzoni, Marek Petrik, Xiangli Chen, Brian Ziebart

What policy should be employed in a Markov decision process with uncertain parameters?

Transfer Learning

Paper
Add Code

Tight Bayesian Ambiguity Sets for Robust MDPs

no code implementations • 15 Nov 2018 • Reazul Hasan Russel, Marek Petrik

Robustness is important for sequential decision making in a stochastic dynamic environment with uncertain probabilistic parameters.

Decision Making Reinforcement Learning (RL)

Paper
Add Code

Interpretable Reinforcement Learning with Ensemble Methods

no code implementations • 19 Sep 2018 • Alexander Brown, Marek Petrik

We propose to use boosted regression trees as a way to compute human-interpretable solutions to reinforcement learning problems.

BIG-bench Machine Learning Interpretable Machine Learning +3

Paper
Add Code

Fast Bellman Updates for Robust MDPs

no code implementations • ICML 2018 • Chin Pang Ho, Marek Petrik, Wolfram Wiesemann

The first algorithm uses a homotopy continuation method to compute updates for L1-constrained s, a-rectangular ambiguity sets.

Paper
Add Code

A Practical Method for Solving Contextual Bandit Problems Using Decision Trees

no code implementations • 14 Jun 2017 • Adam N. Elmachtoub, Ryan McNellis, Sechan Oh, Marek Petrik

We propose a new method for the contextual bandit problem that is simple, practical, and can be applied with little or no domain expertise.

Thompson Sampling

Paper
Add Code

Value Directed Exploration in Multi-Armed Bandits with Structured Priors

no code implementations • 12 Apr 2017 • Bence Cserna, Marek Petrik, Reazul Hasan Russel, Wheeler Ruml

Multi-armed bandits are a quintessential machine learning problem requiring the balancing of exploration and exploitation.

Multi-Armed Bandits

Paper
Add Code

Safe Policy Improvement by Minimizing Robust Baseline Regret

no code implementations • NeurIPS 2016 • Marek Petrik, Yin-Lam Chow, Mohammad Ghavamzadeh

We show that our formulation is NP-hard and propose an approximate algorithm.

Decision Making Decision Making Under Uncertainty

Paper
Add Code

Building an Interpretable Recommender via Loss-Preserving Transformation

no code implementations • 19 Jun 2016 • Amit Dhurandhar, Sechan Oh, Marek Petrik

We propose a method for building an interpretable recommender system for personalizing online content and promotions.

Classification General Classification +2

Paper
Add Code

Robust Partially-Compressed Least-Squares

no code implementations • 16 Oct 2015 • Stephen Becker, Ban Kawas, Marek Petrik, Karthikeyan N. Ramamurthy

While maintaining computational efficiency, our models provide robust solutions that are more accurate--relative to solutions of uncompressed least-squares--than those of classical compressed variants.

Computational Efficiency

Paper
Add Code

RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning

no code implementations • NeurIPS 2014 • Marek Petrik, Dharmashankar Subramanian

We describe how to use robust Markov decision processes for value function approximation with state aggregation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A Bilinear Programming Approach for Multiagent Planning

no code implementations • 15 Jan 2014 • Marek Petrik, Shlomo Zilberstein

Because the algorithm is formulated for bilinear programs, it is more general and simpler to implement.

Dimensionality Reduction

Paper
Add Code

Solution Methods for Constrained Markov Decision Process with Continuous Probability Modulation

no code implementations • 26 Sep 2013 • Marek Petrik, Dharmashankar Subramanian, Janusz Marecki

We propose solution methods for previously-unsolved constrained MDPs in which actions can continuously modify the transition probabilities within some acceptable sets.

Paper
Add Code

Robust Value Function Approximation Using Bilinear Programming

no code implementations • NeurIPS 2009 • Marek Petrik, Shlomo Zilberstein

Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds.

Paper
Add Code

Biasing Approximate Dynamic Programming with a Lower Discount Factor

no code implementations • NeurIPS 2008 • Marek Petrik, Bruno Scherrer

We thus propose another justification: when the rewards are received only sporadically (as it is the case in Tetris), we can derive tighter bounds, which support a significant performance increase with a decrease in the discount factor.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.