Search Results for author: Shipra Agrawal

Found 23 papers, 3 papers with code

Dynamic Pricing and Learning with Long-term Reference Effects

no code implementations19 Feb 2024 Shipra Agrawal, Wei Tang

We study a simple and novel reference price mechanism where reference price is the average of the past prices offered by the seller.

Dynamic Pricing and Learning with Bayesian Persuasion

no code implementations NeurIPS 2023 Shipra Agrawal, Yiding Feng, Wei Tang

Our main result is a computationally efficient online algorithm that achieves an $O(T^{2/3}(m\log T)^{1/3})$ regret bound when the valuation function is linear in the product quality.

Online Allocation and Learning in the Presence of Strategic Agents

no code implementations25 Sep 2022 Steven Yin, Shipra Agrawal, Assaf Zeevi

We study the problem of allocating $T$ sequentially arriving items among $n$ homogeneous agents under the constraint that each agent must receive a pre-specified fraction of all items, with the objective of maximizing the agents' total valuation of items allocated to them.

Scale Free Adversarial Multi Armed Bandits

no code implementations8 Jun 2021 Sudeep Raja Putta, Shipra Agrawal

This technique plays a crucial role in our analysis for controlling the regret when using importance weighted estimators of unbounded losses.

Multi-Armed Bandits

Dynamic Pricing and Learning under the Bass Model

no code implementations9 Mar 2021 Shipra Agrawal, Steven Yin, Assaf Zeevi

Equivalently, the goal is to minimize the regret which measures the revenue loss of the algorithm relative to the optimal expected revenue achievable under the stochastic Bass model with market size $m$ and time horizon $T$.

Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management

no code implementations10 May 2019 Shipra Agrawal, Randy Jia

We consider the relatively less studied problem of designing a learning algorithm for this problem when the underlying demand distribution is unknown.

Management

Discretizing Continuous Action Space for On-Policy Optimization

2 code implementations29 Jan 2019 Yunhao Tang, Shipra Agrawal

In this work, we show that discretizing action space for continuous control is a simple yet powerful technique for on-policy optimization.

Continuous Control Inductive Bias

Boosting Trust Region Policy Optimization by Normalizing Flows Policy

1 code implementation27 Sep 2018 Yunhao Tang, Shipra Agrawal

We propose to improve trust region policy search with normalizing flows policy.

Proportional Allocation: Simple, Distributed, and Diverse Matching with High Entropy

no code implementations ICML 2018 Shipra Agrawal, Morteza Zadimoghaddam, Vahab Mirrokni

Inspired by many applications of bipartite matching in online advertising and machine learning, we study a simple and natural iterative proportional allocation algorithm: Maintain a priority score $\priority_a$ for each node $a\in \mathds{A}$ on one side of the bipartition, initialized as $\priority_a=1$.

BIG-bench Machine Learning Fairness +1

Implicit Policy for Reinforcement Learning

no code implementations10 Jun 2018 Yunhao Tang, Shipra Agrawal

We introduce Implicit Policy, a general class of expressive policies that can flexibly represent complex action distributions in reinforcement learning, with efficient algorithms to compute entropy regularized policy gradients.

reinforcement-learning Reinforcement Learning (RL)

Exploration by Distributional Reinforcement Learning

no code implementations4 May 2018 Yunhao Tang, Shipra Agrawal

We propose a framework based on distributional reinforcement learning and recent attempts to combine Bayesian parameter updates with deep reinforcement learning.

Distributional Reinforcement Learning Efficient Exploration +2

Optimistic posterior sampling for reinforcement learning: worst-case regret bounds

no code implementations NeurIPS 2017 Shipra Agrawal, Randy Jia

Our main result is a high probability regret upper bound of $\tilde{O}(D\sqrt{SAT})$ for any communicating MDP with $S$ states, $A$ actions and diameter $D$, when $T\ge S^5A$.

reinforcement-learning Reinforcement Learning (RL) +1

Bandits with Delayed, Aggregated Anonymous Feedback

no code implementations ICML 2018 Ciara Pike-Burke, Shipra Agrawal, Csaba Szepesvari, Steffen Grunewalder

In this problem, when the player pulls an arm, a reward is generated, however it is not immediately observed.

MNL-Bandit: A Dynamic Learning Approach to Assortment Selection

no code implementations13 Jun 2017 Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi

The retailer observes this choice and the objective is to dynamically learn the model parameters, while optimizing cumulative revenues over a selling horizon of length $T$.

Thompson Sampling for the MNL-Bandit

no code implementations3 Jun 2017 Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi

We consider a sequential subset selection problem under parameter uncertainty, where at each time step, the decision maker selects a subset of cardinality $K$ from $N$ possible items (arms), and observes a (bandit) feedback in the form of the index of one of the items in said subset, or none.

Thompson Sampling

Posterior sampling for reinforcement learning: worst-case regret bounds

no code implementations19 May 2017 Shipra Agrawal, Randy Jia

We present an algorithm based on posterior sampling (aka Thompson sampling) that achieves near-optimal worst-case regret bounds when the underlying Markov Decision Process (MDP) is communicating with a finite, though unknown, diameter.

reinforcement-learning Reinforcement Learning (RL) +1

Linear Contextual Bandits with Knapsacks

no code implementations NeurIPS 2016 Shipra Agrawal, Nikhil R. Devanur

We consider the linear contextual bandit problem with resource consumption, in addition to reward generation.

Multi-Armed Bandits

An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives

no code implementations10 Jun 2015 Shipra Agrawal, Nikhil R. Devanur, Lihong Li

This problem was introduced by Badanidiyuru et al. (2014), who gave a computationally inefficient algorithm with near-optimal regret bounds for it.

Multi-Armed Bandits Open-Ended Question Answering

Fast Algorithms for Online Stochastic Convex Programming

no code implementations28 Oct 2014 Shipra Agrawal, Nikhil R. Devanur

We introduce the online stochastic Convex Programming (CP) problem, a very general version of stochastic online problems which allows arbitrary concave objectives and convex feasibility constraints.

Bandits with concave rewards and convex knapsacks

no code implementations24 Feb 2014 Shipra Agrawal, Nikhil R. Devanur

In this paper, we consider a very general model for exploration-exploitation tradeoff which allows arbitrary concave rewards and convex constraints on the decisions across time, in addition to the customary limitation on the time horizon.

Thompson Sampling for Contextual Bandits with Linear Payoffs

1 code implementation15 Sep 2012 Shipra Agrawal, Navin Goyal

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems.

Multi-Armed Bandits Thompson Sampling

A Dynamic Near-Optimal Algorithm for Online Linear Programming

no code implementations16 Nov 2009 Shipra Agrawal, Zizhuo Wang, Yinyu Ye

A natural optimization model that formulates many online resource allocation and revenue management problems is the online linear program (LP) in which the constraint matrix is revealed column by column along with the corresponding objective coefficient.

Management

Cannot find the paper you are looking for? You can Submit a new open access paper.