no code implementations • 12 Apr 2024 • Mudit Verma, Katherine Metcalf
Incorporating state importance into reward learning improves the speed of policy learning, overall policy performance, and reward recovery on both locomotion and manipulation tasks.
no code implementations • 2 Feb 2024 • Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Kaya Stechly, Mudit Verma, Siddhant Bhambri, Lucas Saldyt, Anil Murthy
On the other side are perhaps over-pessimistic claims that all that LLMs are good for in planning/reasoning tasks are as mere translators of the problem specification from one syntactic format to another, and ship the problem off to external symbolic solvers.
no code implementations • 10 Jan 2024 • Mudit Verma, Siddhant Bhambri, Subbarao Kambhampati
In this work, we explore the task of Perceived Behavior Recognition, where a robot employs a Large Language Model (LLM) to assess the robot's generated behavior in a manner similar to human observer.
no code implementations • 21 Dec 2023 • Siddhant Bhambri, Mudit Verma, Anil Murthy, Subbarao Kambhampati
We introduce the notion of Human-Flexibility, i. e. whether the human partner is amenable to multiple team strategies, with a special case being Specified Orchestration where the human has a single team policy in mind (most constrained case).
no code implementations • 28 Feb 2023 • Tung Thai, Ming Shen, Mayank Garg, Ayush Kalani, Nakul Vaidya, Utkarsh Soni, Mudit Verma, Sriram Gopalakrishnan, Neeraj Varshney, Chitta Baral, Subbarao Kambhampati, Jivko Sinapov, Matthias Scheutz
Learning to detect, characterize and accommodate novelties is a challenge that agents operating in open-world domains need to address to be able to guarantee satisfactory task performance.
no code implementations • 17 Feb 2023 • Mudit Verma, Subbarao Kambhampati
Reinforcement Learning has suffered from poor reward specification, and issues for reward hacking even in simple enough domains.
no code implementations • 17 Feb 2023 • Mudit Verma, Siddhant Bhambri, Subbarao Kambhampati
Preference Based Reinforcement Learning has shown much promise for utilizing human binary feedback on queried trajectory pairs to recover the underlying reward model of the Human in the Loop (HiL).
no code implementations • 17 Feb 2023 • Mudit Verma, Subbarao Kambhampati
We propose a data-driven reward initialization method that does not add any additional cost to the human in the loop and negligible cost to the PbRL agent and show that doing so ensures that the predicted rewards of the initialized reward model are uniform in the state space and this reduces the variability in the performance of the method across multiple runs and is shown to improve the overall performance compared to other initialization methods.
no code implementations • 27 Oct 2022 • Utkarsh Soni, Nupur Thakur, Sarath Sreedharan, Lin Guan, Mudit Verma, Matthew Marquez, Subbarao Kambhampati
If the relevant concept is not in the shared vocabulary, then it is learned.
no code implementations • 17 Oct 2022 • Mudit Verma, Katherine Metcalf
Specifying rewards for reinforcement learned (RL) agents is challenging.
no code implementations • 7 Oct 2022 • Mudit Verma, Ayush Kharkwal, Subbarao Kambhampati
Through our experiments, we show that our method can provide an interpretable means of solving the Advice-Conformance Verification problem by conveying whether or not the agent is using the human's advice.
no code implementations • 21 Sep 2021 • Subbarao Kambhampati, Sarath Sreedharan, Mudit Verma, Yantian Zha, Lin Guan
The jury is still out on whether AI systems will need to use symbols in their internal reasoning to achieve general intelligence capabilities.
1 code implementation • 15 Sep 2021 • Sriram Gopalakrishnan, Mudit Verma, Subbarao Kambhampati
We present a framework to model the human agent's behavior with respect to state uncertainty, and can be used to compute MDP policies that accounts for these problems.
no code implementations • 3 May 2021 • Zahra Zahedi, Mudit Verma, Sarath Sreedharan, Subbarao Kambhampati
The problem of trust management is particularly challenging in mixed human-robot teams where the human and the robot may have different models about the task at hand and thus may have different expectations regarding the current course of action, thereby forcing the robot to focus on the costly explicable behavior.
no code implementations • 22 Feb 2021 • Mudit Verma, Pradyumna Sinha, Karan Goyal, Apoorva Verma, Seba Susan
Neural networks have now long been used for solving complex problems of image domain, yet designing the same needs manual expertise.
no code implementations • 12 Jul 2020 • Mudit Verma, Arun Balaji Buduru
Hence, there is an increasing need for real-time and fine-grained content analysis services, including language identification, content transcription, and analysis.
1 code implementation • NeurIPS 2021 • Lin Guan, Mudit Verma, Sihang Guo, Ruohan Zhang, Subbarao Kambhampati
We focus on the task of learning from feedback, in which the human trainer not only gives binary evaluative "good" or "bad" feedback for queried state-action pairs, but also provides a visual explanation by annotating relevant features in images.
no code implementations • ICLR 2022 • Sarath Sreedharan, Utkarsh Soni, Mudit Verma, Siddharth Srivastava, Subbarao Kambhampati
As increasingly complex AI systems are introduced into our daily lives, it becomes important for such systems to be capable of explaining the rationale for their decisions and allowing users to contest these decisions.
no code implementations • 6 Dec 2019 • Mudit Verma, Siddhant Bhambri, Saurabh Gupta, Arun Balaji Buduru
Rapid advancements in the Internet of Things (IoT) have facilitated more efficient deployment of smart environment solutions for specific user requirement.