no code implementations • 19 Feb 2024 • Shipra Agrawal, Wei Tang
We study a simple and novel reference price mechanism where reference price is the average of the past prices offered by the seller.
no code implementations • NeurIPS 2023 • Shipra Agrawal, Yiding Feng, Wei Tang
Our main result is a computationally efficient online algorithm that achieves an $O(T^{2/3}(m\log T)^{1/3})$ regret bound when the valuation function is linear in the product quality.
no code implementations • 25 Sep 2022 • Steven Yin, Shipra Agrawal, Assaf Zeevi
We study the problem of allocating $T$ sequentially arriving items among $n$ homogeneous agents under the constraint that each agent must receive a pre-specified fraction of all items, with the objective of maximizing the agents' total valuation of items allocated to them.
no code implementations • 8 Jun 2021 • Sudeep Raja Putta, Shipra Agrawal
This technique plays a crucial role in our analysis for controlling the regret when using importance weighted estimators of unbounded losses.
no code implementations • 9 Mar 2021 • Shipra Agrawal, Steven Yin, Assaf Zeevi
Equivalently, the goal is to minimize the regret which measures the revenue loss of the algorithm relative to the optimal expected revenue achievable under the stochastic Bass model with market size $m$ and time horizon $T$.
no code implementations • ICML 2020 • Yunhao Tang, Shipra Agrawal, Yuri Faenza
In particular, we investigate a specific methodology for solving IPs, known as the Cutting Plane Method.
no code implementations • 10 May 2019 • Shipra Agrawal, Randy Jia
We consider the relatively less studied problem of designing a learning algorithm for this problem when the underlying demand distribution is unknown.
2 code implementations • 29 Jan 2019 • Yunhao Tang, Shipra Agrawal
In this work, we show that discretizing action space for continuous control is a simple yet powerful technique for on-policy optimization.
1 code implementation • 27 Sep 2018 • Yunhao Tang, Shipra Agrawal
We propose to improve trust region policy search with normalizing flows policy.
no code implementations • ICML 2018 • Shipra Agrawal, Morteza Zadimoghaddam, Vahab Mirrokni
Inspired by many applications of bipartite matching in online advertising and machine learning, we study a simple and natural iterative proportional allocation algorithm: Maintain a priority score $\priority_a$ for each node $a\in \mathds{A}$ on one side of the bipartition, initialized as $\priority_a=1$.
no code implementations • 10 Jun 2018 • Yunhao Tang, Shipra Agrawal
We introduce Implicit Policy, a general class of expressive policies that can flexibly represent complex action distributions in reinforcement learning, with efficient algorithms to compute entropy regularized policy gradients.
no code implementations • 4 May 2018 • Yunhao Tang, Shipra Agrawal
We propose a framework based on distributional reinforcement learning and recent attempts to combine Bayesian parameter updates with deep reinforcement learning.
Distributional Reinforcement Learning Efficient Exploration +2
no code implementations • NeurIPS 2017 • Shipra Agrawal, Randy Jia
Our main result is a high probability regret upper bound of $\tilde{O}(D\sqrt{SAT})$ for any communicating MDP with $S$ states, $A$ actions and diameter $D$, when $T\ge S^5A$.
no code implementations • ICML 2018 • Ciara Pike-Burke, Shipra Agrawal, Csaba Szepesvari, Steffen Grunewalder
In this problem, when the player pulls an arm, a reward is generated, however it is not immediately observed.
no code implementations • 13 Jun 2017 • Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi
The retailer observes this choice and the objective is to dynamically learn the model parameters, while optimizing cumulative revenues over a selling horizon of length $T$.
no code implementations • 3 Jun 2017 • Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi
We consider a sequential subset selection problem under parameter uncertainty, where at each time step, the decision maker selects a subset of cardinality $K$ from $N$ possible items (arms), and observes a (bandit) feedback in the form of the index of one of the items in said subset, or none.
no code implementations • 19 May 2017 • Shipra Agrawal, Randy Jia
We present an algorithm based on posterior sampling (aka Thompson sampling) that achieves near-optimal worst-case regret bounds when the underlying Markov Decision Process (MDP) is communicating with a finite, though unknown, diameter.
no code implementations • NeurIPS 2016 • Shipra Agrawal, Nikhil R. Devanur
We consider the linear contextual bandit problem with resource consumption, in addition to reward generation.
no code implementations • 10 Jun 2015 • Shipra Agrawal, Nikhil R. Devanur, Lihong Li
This problem was introduced by Badanidiyuru et al. (2014), who gave a computationally inefficient algorithm with near-optimal regret bounds for it.
no code implementations • 28 Oct 2014 • Shipra Agrawal, Nikhil R. Devanur
We introduce the online stochastic Convex Programming (CP) problem, a very general version of stochastic online problems which allows arbitrary concave objectives and convex feasibility constraints.
no code implementations • 24 Feb 2014 • Shipra Agrawal, Nikhil R. Devanur
In this paper, we consider a very general model for exploration-exploitation tradeoff which allows arbitrary concave rewards and convex constraints on the decisions across time, in addition to the customary limitation on the time horizon.
1 code implementation • 15 Sep 2012 • Shipra Agrawal, Navin Goyal
Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems.
no code implementations • 16 Nov 2009 • Shipra Agrawal, Zizhuo Wang, Yinyu Ye
A natural optimization model that formulates many online resource allocation and revenue management problems is the online linear program (LP) in which the constraint matrix is revealed column by column along with the corresponding objective coefficient.