no code implementations • 6 Jul 2020 • Chandramouli Kamanchi, Gopinath Ashok Kumar, Nachiappan Sundaram, Ravindra Babu T, Chaithanya Bandi
We describe a supply chain optimization model deployed in an online fashion e-commerce company in India called Myntra.
1 code implementation • 13 Nov 2019 • Raghuram Bharadwaj Diddigi, Chandramouli Kamanchi, Shalabh Bhatnagar
In this work, we propose a convergent on-line off-policy TD algorithm under linear function approximation.
1 code implementation • 1 Nov 2019 • Indu John, Chandramouli Kamanchi, Shalabh Bhatnagar
In most RL algorithms such as Q-learning, the Bellman equation and the Bellman operator play an important role.
no code implementations • 16 Jun 2019 • Raghuram Bharadwaj Diddigi, Chandramouli Kamanchi, Shalabh Bhatnagar
This problem is formulated as a min-max Markov game in the literature.
2 code implementations • 10 May 2019 • Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Shalabh Bhatnagar
In this work, we propose a second order value iteration procedure that is obtained by applying the Newton-Raphson method to the successive relaxation value iteration scheme.
no code implementations • 9 Mar 2019 • Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Shalabh Bhatnagar
We first derive a modified fixed point iteration for SOR Q-values and utilize stochastic approximation to derive a learning algorithm to compute the optimal value function and an optimal policy.
no code implementations • 11 Feb 2019 • Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi, Prabuchandran K. J., Shalabh Bhatnagar
In many of the practical applications, the analytical form of the density is not known and only the samples from the distribution are available.