no code implementations • 20 Nov 2023 • Omkar Shelke, Pranavi Pathakota, Anandsingh Chauhan, Harshad Khadilkar, Hardik Meisheri, Balaraman Ravindran
This paper presents an integrated algorithmic framework for minimising product delivery costs in e-commerce (known as the cost-to-serve or C2S).
no code implementations • 3 Nov 2023 • Durgesh Kalwar, Omkar Shelke, Harshad Khadilkar
We consider the inventory management problem, where the goal is to balance conflicting objectives such as availability and wastage of a large range of products in a store.
no code implementations • 2 Mar 2022 • Durgesh Kalwar, Omkar Shelke, Somjit Nath, Hardik Meisheri, Harshad Khadilkar
Exploration methods have been used to sample better trajectories in large environments while auxiliary tasks have been incorporated where the reward is sparse.
no code implementations • 23 Feb 2021 • Omkar Shelke, Hardik Meisheri, Harshad Khadilkar
In this paper, we focus on developing a curriculum for learning a robust and promising policy in a constrained computational budget of 100, 000 games, starting from a fixed base policy (which is itself trained to imitate a noisy expert policy).
no code implementations • 12 Nov 2019 • Hardik Meisheri, Omkar Shelke, Richa Verma, Harshad Khadilkar
Our methodology involves training an agent initially through imitation learning on a noisy expert policy, followed by a proximal-policy optimization (PPO) reinforcement learning algorithm.