1 code implementation • 23 Feb 2024 • Swaroop Nath, Tejpalsingh Siledar, Sankara Sri Raghava Ravindra Muddu, Rupasai Rangaraju, Harshad Khadilkar, Pushpak Bhattacharyya, Suman Banerjee, Amey Patil, Sudhanshu Shekhar Singh, Muthusamy Chelliah, Nikesh Garera
While this strategy has proven effective, the training methodology requires a lot of human preference annotation (usually in the order of tens of thousands) to train $\varphi$.
1 code implementation • 23 Feb 2024 • Swaroop Nath, Harshad Khadilkar, Pushpak Bhattacharyya
Expressivity of a neural network is the class of functions it can approximate.
1 code implementation • 29 Nov 2023 • Swaroop Nath, Harshad Khadilkar, Pushpak Bhattacharyya
Query-focused Summarization (QfS) deals with systems that generate summaries from document(s) based on a query.
no code implementations • 20 Nov 2023 • Omkar Shelke, Pranavi Pathakota, Anandsingh Chauhan, Harshad Khadilkar, Hardik Meisheri, Balaraman Ravindran
This paper presents an integrated algorithmic framework for minimising product delivery costs in e-commerce (known as the cost-to-serve or C2S).
no code implementations • 3 Nov 2023 • Durgesh Kalwar, Omkar Shelke, Harshad Khadilkar
We consider the inventory management problem, where the goal is to balance conflicting objectives such as availability and wastage of a large range of products in a store.
no code implementations • 11 Jul 2023 • Harshad Khadilkar
The key idea is the observation that the input to every neuron in a neural network is a linear combination of the activations of neurons in the previous layer, as well as the parameters (weights and biases) of the layer.
no code implementations • 28 Jun 2023 • Pranavi Pathakota, Hardik Meisheri, Harshad Khadilkar
The ability to learn robust policies while generalizing over large discrete action spaces is an open challenge for intelligent systems, especially in noisy environments that face the curse of dimensionality.
no code implementations • 10 May 2023 • Harshad Khadilkar
We present a simple, sample-efficient algorithm for introducing large but directed learning steps in reinforcement learning (RL), through the use of evolutionary operators.
no code implementations • 28 Oct 2022 • Harshad Khadilkar, Hardik Meisheri
A significant challenge in reinforcement learning is quantifying the complex relationship between actions and long-term rewards.
1 code implementation • 28 Jul 2022 • Ramya S. Hebbalaguppe, Soumya Suvra Goshal, Jatin Prakash, Harshad Khadilkar, Chetan Arora
One of the major advantages of CnC is that it does not require any hold-out data apart from the training set.
no code implementations • 14 Jun 2022 • Harshad Khadilkar
The vehicle routing problem is a well known class of NP-hard combinatorial optimisation problems in literature.
no code implementations • 2 Mar 2022 • Durgesh Kalwar, Omkar Shelke, Somjit Nath, Hardik Meisheri, Harshad Khadilkar
Exploration methods have been used to sample better trajectories in large environments while auxiliary tasks have been incorporated where the reward is sparse.
no code implementations • 2 Mar 2022 • Hardik Meisheri, Somjit Nath, Mayank Baranwal, Harshad Khadilkar
Through empirical evaluations, it is further shown that the inventory management with uncertain lead times is not only equivalent to that of delay in information sharing across multiple echelons (\emph{observation delay}), a model trained to handle one kind of delay is capable to handle delays of another kind without requiring to be retrained.
no code implementations • 16 Dec 2021 • Pranavi Pathakota, Kunwar Zaid, Anulekha Dhara, Hardik Meisheri, Shaun D Souza, Dheeraj Shah, Harshad Khadilkar
We describe a novel decision-making problem developed in response to the demands of retail electronic commerce (e-commerce).
no code implementations • 7 Dec 2021 • Supratim Ghosh, Aritra Pal, Prashant Kumar, Ankush Ojha, Aditya Paranjape, Souvik Barat, Harshad Khadilkar
Parcel sorting operations in logistics enterprises aim to achieve a high throughput of parcels through sorting centers.
1 code implementation • 17 Aug 2021 • Somjit Nath, Mayank Baranwal, Harshad Khadilkar
Several real-world scenarios, such as remote control and sensing, are comprised of action and observation delays.
no code implementations • 24 Feb 2021 • Nazneen N Sultana, Vinita Baniwal, Ansuma Basumatary, Piyush Mittal, Supratim Ghosh, Harshad Khadilkar
This paper develops an inherently parallelised, fast, approximate learning-based solution to the generic class of Capacitated Vehicle Routing Problems with Time Windows and Dynamic Routing (CVRP-TWDR).
no code implementations • 23 Feb 2021 • Omkar Shelke, Hardik Meisheri, Harshad Khadilkar
In this paper, we focus on developing a curriculum for learning a robust and promising policy in a constrained computational budget of 100, 000 games, starting from a fixed base policy (which is itself trained to imitate a noisy expert policy).
no code implementations • 1 Nov 2020 • Hardik Meisheri, Harshad Khadilkar
We describe our solution approach for Pommerman TeamRadio, a competition environment associated with NeurIPS 2019.
no code implementations • 1 Jul 2020 • Richa Verma, Aniruddha Singhal, Harshad Khadilkar, Ansuma Basumatary, Siddharth Nayak, Harsh Vardhan Singh, Swagat Kumar, Rajesh Sinha
We propose a Deep Reinforcement Learning (Deep RL) algorithm for solving the online 3D bin packing problem for an arbitrary number of bins and any bin size.
1 code implementation • 7 Jun 2020 • Nazneen N Sultana, Hardik Meisheri, Vinita Baniwal, Somjit Nath, Balaraman Ravindran, Harshad Khadilkar
This paper describes the application of reinforcement learning (RL) to multi-product inventory management in supply chains.
no code implementations • 21 Apr 2020 • Somjit Nath, Richa Verma, Abhik Ray, Harshad Khadilkar
We propose a generic reward shaping approach for improving the rate of convergence in reinforcement learning (RL), called Self Improvement Based REwards, or SIBRE.
1 code implementation • 31 Mar 2020 • Harshad Khadilkar, Tanuja Ganu, Deva P Seetharam
In the context of the ongoing Covid-19 pandemic, several reports and studies have attempted to model and predict the spread of the disease.
no code implementations • 12 Nov 2019 • Hardik Meisheri, Omkar Shelke, Richa Verma, Harshad Khadilkar
Our methodology involves training an agent initially through imitation learning on a noisy expert policy, followed by a proximal-policy optimization (PPO) reinforcement learning algorithm.
no code implementations • 1 Oct 2019 • Hardik Meisheri, Vinita Baniwal, Nazneen N Sultana, Balaraman Ravindran, Harshad Khadilkar
This paper describes a purely data-driven solution to a class of sequential decision-making problems with a large number of concurrent online decisions, with applications to computing systems and operations research.
no code implementations • WS 2018 • Hardik Meisheri, Harshad Khadilkar
Most of the existing state of the art sentiment classification techniques involve the use of pre-trained embeddings.