Search Results for author: Randy Jia

Found 6 papers, 0 papers with code

Learning an Inventory Control Policy with General Inventory Arrival Dynamics

no code implementations • 26 Oct 2023 • Sohrab Andaz, Carson Eisenach, Dhruv Madeka, Kari Torkkola, Randy Jia, Dean Foster, Sham Kakade

In this paper we address the problem of learning and backtesting inventory control policies in the presence of general arrival dynamics -- which we term as a quantity-over-time arrivals model (QOT).

Paper
Add Code

Contextual Bandits for Evaluating and Improving Inventory Control Policies

no code implementations • 24 Oct 2023 • Dean Foster, Randy Jia, Dhruv Madeka

Solutions to address the periodic review inventory control problem with nonstationary random demand, lost sales, and stochastic vendor lead times typically involve making strong assumptions on the dynamics for either approximation or simulation, and applying methods such as optimization, dynamic programming, or reinforcement learning.

Multi-Armed Bandits

Paper
Add Code

Linear Reinforcement Learning with Ball Structure Action Space

no code implementations • 14 Nov 2022 • Zeyu Jia, Randy Jia, Dhruv Madeka, Dean P. Foster

We study the problem of Reinforcement Learning (RL) with linear function approximation, i. e. assuming the optimal action-value function is linear in a known $d$-dimensional feature mapping.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management

no code implementations • 10 May 2019 • Shipra Agrawal, Randy Jia

We consider the relatively less studied problem of designing a learning algorithm for this problem when the underlying demand distribution is unknown.

Management

Paper
Add Code

Optimistic posterior sampling for reinforcement learning: worst-case regret bounds

no code implementations • NeurIPS 2017 • Shipra Agrawal, Randy Jia

Our main result is a high probability regret upper bound of $\tilde{O}(D\sqrt{SAT})$ for any communicating MDP with $S$ states, $A$ actions and diameter $D$, when $T\ge S^5A$.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Posterior sampling for reinforcement learning: worst-case regret bounds

no code implementations • 19 May 2017 • Shipra Agrawal, Randy Jia

We present an algorithm based on posterior sampling (aka Thompson sampling) that achieves near-optimal worst-case regret bounds when the underlying Markov Decision Process (MDP) is communicating with a finite, though unknown, diameter.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.