Search Results for author: Randy Jia

Found 6 papers, 0 papers with code

Learning an Inventory Control Policy with General Inventory Arrival Dynamics

no code implementations26 Oct 2023 Sohrab Andaz, Carson Eisenach, Dhruv Madeka, Kari Torkkola, Randy Jia, Dean Foster, Sham Kakade

In this paper we address the problem of learning and backtesting inventory control policies in the presence of general arrival dynamics -- which we term as a quantity-over-time arrivals model (QOT).

Contextual Bandits for Evaluating and Improving Inventory Control Policies

no code implementations24 Oct 2023 Dean Foster, Randy Jia, Dhruv Madeka

Solutions to address the periodic review inventory control problem with nonstationary random demand, lost sales, and stochastic vendor lead times typically involve making strong assumptions on the dynamics for either approximation or simulation, and applying methods such as optimization, dynamic programming, or reinforcement learning.

Multi-Armed Bandits

Linear Reinforcement Learning with Ball Structure Action Space

no code implementations14 Nov 2022 Zeyu Jia, Randy Jia, Dhruv Madeka, Dean P. Foster

We study the problem of Reinforcement Learning (RL) with linear function approximation, i. e. assuming the optimal action-value function is linear in a known $d$-dimensional feature mapping.

reinforcement-learning Reinforcement Learning (RL)

Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management

no code implementations10 May 2019 Shipra Agrawal, Randy Jia

We consider the relatively less studied problem of designing a learning algorithm for this problem when the underlying demand distribution is unknown.

Management

Optimistic posterior sampling for reinforcement learning: worst-case regret bounds

no code implementations NeurIPS 2017 Shipra Agrawal, Randy Jia

Our main result is a high probability regret upper bound of $\tilde{O}(D\sqrt{SAT})$ for any communicating MDP with $S$ states, $A$ actions and diameter $D$, when $T\ge S^5A$.

reinforcement-learning Reinforcement Learning (RL) +1

Posterior sampling for reinforcement learning: worst-case regret bounds

no code implementations19 May 2017 Shipra Agrawal, Randy Jia

We present an algorithm based on posterior sampling (aka Thompson sampling) that achieves near-optimal worst-case regret bounds when the underlying Markov Decision Process (MDP) is communicating with a finite, though unknown, diameter.

reinforcement-learning Reinforcement Learning (RL) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.