Search Results for author: Shipra Agrawal

Found 23 papers, 3 papers with code

Dynamic Pricing and Learning with Long-term Reference Effects

no code implementations • 19 Feb 2024 • Shipra Agrawal, Wei Tang

We study a simple and novel reference price mechanism where reference price is the average of the past prices offered by the seller.

Paper
Add Code

Dynamic Pricing and Learning with Bayesian Persuasion

no code implementations • NeurIPS 2023 • Shipra Agrawal, Yiding Feng, Wei Tang

Our main result is a computationally efficient online algorithm that achieves an $O(T^{2/3}(m\log T)^{1/3})$ regret bound when the valuation function is linear in the product quality.

Paper
Add Code

Online Allocation and Learning in the Presence of Strategic Agents

no code implementations • 25 Sep 2022 • Steven Yin, Shipra Agrawal, Assaf Zeevi

We study the problem of allocating $T$ sequentially arriving items among $n$ homogeneous agents under the constraint that each agent must receive a pre-specified fraction of all items, with the objective of maximizing the agents' total valuation of items allocated to them.

Paper
Add Code

Scale Free Adversarial Multi Armed Bandits

no code implementations • 8 Jun 2021 • Sudeep Raja Putta, Shipra Agrawal

This technique plays a crucial role in our analysis for controlling the regret when using importance weighted estimators of unbounded losses.

Multi-Armed Bandits

Paper
Add Code

Dynamic Pricing and Learning under the Bass Model

no code implementations • 9 Mar 2021 • Shipra Agrawal, Steven Yin, Assaf Zeevi

Equivalently, the goal is to minimize the regret which measures the revenue loss of the algorithm relative to the optimal expected revenue achievable under the stochastic Bass model with market size $m$ and time horizon $T$.

Paper
Add Code

Reinforcement Learning for Integer Programming: Learning to Cut

no code implementations • ICML 2020 • Yunhao Tang, Shipra Agrawal, Yuri Faenza

In particular, we investigate a specific methodology for solving IPs, known as the Cutting Plane Method.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management

no code implementations • 10 May 2019 • Shipra Agrawal, Randy Jia

We consider the relatively less studied problem of designing a learning algorithm for this problem when the underlying demand distribution is unknown.

Management

Paper
Add Code

Discretizing Continuous Action Space for On-Policy Optimization

2 code implementations • 29 Jan 2019 • Yunhao Tang, Shipra Agrawal

In this work, we show that discretizing action space for continuous control is a simple yet powerful technique for on-policy optimization.

Continuous Control Inductive Bias

2,555

Paper
Code

Boosting Trust Region Policy Optimization by Normalizing Flows Policy

1 code implementation • 27 Sep 2018 • Yunhao Tang, Shipra Agrawal

We propose to improve trust region policy search with normalizing flows policy.

Paper
Code

Proportional Allocation: Simple, Distributed, and Diverse Matching with High Entropy

no code implementations • ICML 2018 • Shipra Agrawal, Morteza Zadimoghaddam, Vahab Mirrokni

Inspired by many applications of bipartite matching in online advertising and machine learning, we study a simple and natural iterative proportional allocation algorithm: Maintain a priority score $\priority_a$ for each node $a\in \mathds{A}$ on one side of the bipartition, initialized as $\priority_a=1$.

BIG-bench Machine Learning Fairness +1

Paper
Add Code

Implicit Policy for Reinforcement Learning

no code implementations • 10 Jun 2018 • Yunhao Tang, Shipra Agrawal

We introduce Implicit Policy, a general class of expressive policies that can flexibly represent complex action distributions in reinforcement learning, with efficient algorithms to compute entropy regularized policy gradients.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Exploration by Distributional Reinforcement Learning

no code implementations • 4 May 2018 • Yunhao Tang, Shipra Agrawal

We propose a framework based on distributional reinforcement learning and recent attempts to combine Bayesian parameter updates with deep reinforcement learning.

Distributional Reinforcement Learning Efficient Exploration +2

Paper
Add Code

Optimistic posterior sampling for reinforcement learning: worst-case regret bounds

no code implementations • NeurIPS 2017 • Shipra Agrawal, Randy Jia

Our main result is a high probability regret upper bound of $\tilde{O}(D\sqrt{SAT})$ for any communicating MDP with $S$ states, $A$ actions and diameter $D$, when $T\ge S^5A$.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Bandits with Delayed, Aggregated Anonymous Feedback

no code implementations • ICML 2018 • Ciara Pike-Burke, Shipra Agrawal, Csaba Szepesvari, Steffen Grunewalder

In this problem, when the player pulls an arm, a reward is generated, however it is not immediately observed.

Paper
Add Code

MNL-Bandit: A Dynamic Learning Approach to Assortment Selection

no code implementations • 13 Jun 2017 • Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi

The retailer observes this choice and the objective is to dynamically learn the model parameters, while optimizing cumulative revenues over a selling horizon of length $T$.

Paper
Add Code

Thompson Sampling for the MNL-Bandit

no code implementations • 3 Jun 2017 • Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi

We consider a sequential subset selection problem under parameter uncertainty, where at each time step, the decision maker selects a subset of cardinality $K$ from $N$ possible items (arms), and observes a (bandit) feedback in the form of the index of one of the items in said subset, or none.

Thompson Sampling

Paper
Add Code

Posterior sampling for reinforcement learning: worst-case regret bounds

no code implementations • 19 May 2017 • Shipra Agrawal, Randy Jia

We present an algorithm based on posterior sampling (aka Thompson sampling) that achieves near-optimal worst-case regret bounds when the underlying Markov Decision Process (MDP) is communicating with a finite, though unknown, diameter.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Linear Contextual Bandits with Knapsacks

no code implementations • NeurIPS 2016 • Shipra Agrawal, Nikhil R. Devanur

We consider the linear contextual bandit problem with resource consumption, in addition to reward generation.

Multi-Armed Bandits

Paper
Add Code

An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives

no code implementations • 10 Jun 2015 • Shipra Agrawal, Nikhil R. Devanur, Lihong Li

This problem was introduced by Badanidiyuru et al. (2014), who gave a computationally inefficient algorithm with near-optimal regret bounds for it.

Multi-Armed Bandits Open-Ended Question Answering

Paper
Add Code

Fast Algorithms for Online Stochastic Convex Programming

no code implementations • 28 Oct 2014 • Shipra Agrawal, Nikhil R. Devanur

We introduce the online stochastic Convex Programming (CP) problem, a very general version of stochastic online problems which allows arbitrary concave objectives and convex feasibility constraints.

Paper
Add Code

Bandits with concave rewards and convex knapsacks

no code implementations • 24 Feb 2014 • Shipra Agrawal, Nikhil R. Devanur

In this paper, we consider a very general model for exploration-exploitation tradeoff which allows arbitrary concave rewards and convex constraints on the decisions across time, in addition to the customary limitation on the time horizon.

Paper
Add Code

Thompson Sampling for Contextual Bandits with Linear Payoffs

1 code implementation • 15 Sep 2012 • Shipra Agrawal, Navin Goyal

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems.

Multi-Armed Bandits Thompson Sampling

Paper
Code

A Dynamic Near-Optimal Algorithm for Online Linear Programming

no code implementations • 16 Nov 2009 • Shipra Agrawal, Zizhuo Wang, Yinyu Ye

A natural optimization model that formulates many online resource allocation and revenue management problems is the online linear program (LP) in which the constraint matrix is revealed column by column along with the corresponding objective coefficient.

Management

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.