no code implementations • 16 Apr 2024 • Caroline Wang, Arrasy Rahman, Ishan Durugkar, Elad Liebman, Peter Stone
POAM is a policy gradient, multi-agent reinforcement learning approach to the NAHT problem, that enables adaptation to diverse teammate behaviors by learning representations of teammate behaviors.
no code implementations • 10 Oct 2023 • Siddhant Agarwal, Ishan Durugkar, Peter Stone, Amy Zhang
We further introduce an entropy-regularized policy optimization objective, that we call $state$-MaxEnt RL (or $s$-MaxEnt RL) as a special case of our objective.
no code implementations • 8 Nov 2022 • Eddy Hudson, Ishan Durugkar, Garrett Warnell, Peter Stone
Given a dataset of expert agent interactions with an environment of interest, a viable method to extract an effective agent policy is to estimate the maximum likelihood policy indicated by this data.
1 code implementation • 1 Jun 2022 • Caroline Wang, Ishan Durugkar, Elad Liebman, Peter Stone
The theoretical analysis shows that under certain conditions, each agent minimizing its individual distribution mismatch allows the convergence to the joint policy that generated the target distribution.
Multi-agent Reinforcement Learning reinforcement-learning +2
no code implementations • 28 Oct 2021 • Ishan Durugkar, Steven Hansen, Stephen Spencer, Volodymyr Mnih
This paper deals with the problem of learning a skill-conditioned policy that acts meaningfully in the absence of a reward signal.
1 code implementation • NeurIPS 2021 • Ishan Durugkar, Mauricio Tec, Scott Niekum, Peter Stone
In this paper, we investigate whether one such objective, the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution, can be utilized effectively for reinforcement learning (RL) tasks.
no code implementations • ICML 2020 • Brahma Pavse, Ishan Durugkar, Josiah Hanna, Peter Stone
In this batch setting, we show that TD(0) may converge to an inaccurate value function because the update following an action is weighted according to the number of times that action occurred in the batch -- not the true probability of the action under the given policy.
no code implementations • NeurIPS 2020 • Siddharth Desai, Ishan Durugkar, Haresh Karnan, Garrett Warnell, Josiah Hanna, Peter Stone
We examine the problem of transferring a policy learned in a source environment to a target environment with different dynamics, particularly in the case where it is critical to reduce the amount of interaction with the target environment during learning.
no code implementations • ICLR 2019 • Ishan Durugkar, Bo Liu, Peter Stone
Temporal Difference learning with function approximation has been widely used recently and has led to several successful results.
no code implementations • 5 Apr 2019 • Ishan Durugkar, Matthew Hausknecht, Adith Swaminathan, Patrick MacAlpine
Policy gradient algorithms typically combine discounted future rewards with an estimated value function, to compute the direction and magnitude of parameter updates.
no code implementations • ICLR 2018 • Ishan Durugkar, Peter Stone
In this work we propose a constraint on the TD update that minimizes change to the target values.
7 code implementations • ICLR 2018 • Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, Andrew McCallum
Knowledge bases (KB), both automatically and manually constructed, are often incomplete --- many valid facts can be inferred from the KB by synthesizing existing information.
1 code implementation • 5 Nov 2016 • Ishan Durugkar, Ian Gemp, Sridhar Mahadevan
Generative adversarial networks (GANs) are a framework for producing a generative model by way of a two-player minimax game.
Ranked #67 on Image Generation on CIFAR-10 (Inception score metric)
no code implementations • 21 Aug 2016 • Ian Gemp, Ishan Durugkar, Mario Parente, M. Darby Dyar, Sridhar Mahadevan
Recent advances in semi-supervised learning with deep generative models have shown promise in generalizing from small labeled datasets ($\mathbf{x},\mathbf{y}$) to large unlabeled ones ($\mathbf{x}$).