Search Results for author: Mudit Verma

Found 19 papers, 2 papers with code

Hindsight PRIORs for Reward Learning from Human Preferences

no code implementations12 Apr 2024 Mudit Verma, Katherine Metcalf

Incorporating state importance into reward learning improves the speed of policy learning, overall policy performance, and reward recovery on both locomotion and manipulation tasks.

LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks

no code implementations2 Feb 2024 Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Kaya Stechly, Mudit Verma, Siddhant Bhambri, Lucas Saldyt, Anil Murthy

On the other side are perhaps over-pessimistic claims that all that LLMs are good for in planning/reasoning tasks are as mere translators of the problem specification from one syntactic format to another, and ship the problem off to external symbolic solvers.

Theory of Mind abilities of Large Language Models in Human-Robot Interaction : An Illusion?

no code implementations10 Jan 2024 Mudit Verma, Siddhant Bhambri, Subbarao Kambhampati

In this work, we explore the task of Perceived Behavior Recognition, where a robot employs a Large Language Model (LLM) to assess the robot's generated behavior in a manner similar to human observer.

Language Modelling Large Language Model

Benchmarking Multi-Agent Preference-based Reinforcement Learning for Human-AI Teaming

no code implementations21 Dec 2023 Siddhant Bhambri, Mudit Verma, Anil Murthy, Subbarao Kambhampati

We introduce the notion of Human-Flexibility, i. e. whether the human partner is amenable to multiple team strategies, with a special case being Specified Orchestration where the human has a single team policy in mind (most constrained case).

Benchmarking reinforcement-learning

Methods and Mechanisms for Interactive Novelty Handling in Adversarial Environments

no code implementations28 Feb 2023 Tung Thai, Ming Shen, Mayank Garg, Ayush Kalani, Nakul Vaidya, Utkarsh Soni, Mudit Verma, Sriram Gopalakrishnan, Neeraj Varshney, Chitta Baral, Subbarao Kambhampati, Jivko Sinapov, Matthias Scheutz

Learning to detect, characterize and accommodate novelties is a challenge that agents operating in open-world domains need to address to be able to guarantee satisfactory task performance.

Novelty Detection

A State Augmentation based approach to Reinforcement Learning from Human Preferences

no code implementations17 Feb 2023 Mudit Verma, Subbarao Kambhampati

Reinforcement Learning has suffered from poor reward specification, and issues for reward hacking even in simple enough domains.

reinforcement-learning Reinforcement Learning (RL)

Exploiting Unlabeled Data for Feedback Efficient Human Preference based Reinforcement Learning

no code implementations17 Feb 2023 Mudit Verma, Siddhant Bhambri, Subbarao Kambhampati

Preference Based Reinforcement Learning has shown much promise for utilizing human binary feedback on queried trajectory pairs to recover the underlying reward model of the Human in the Loop (HiL).

reinforcement-learning Reinforcement Learning (RL)

Data Driven Reward Initialization for Preference based Reinforcement Learning

no code implementations17 Feb 2023 Mudit Verma, Subbarao Kambhampati

We propose a data-driven reward initialization method that does not add any additional cost to the human in the loop and negligible cost to the PbRL agent and show that doing so ensures that the predicted rewards of the initialized reward model are uniform in the state space and this reduces the variability in the performance of the method across multiple runs and is shown to improve the overall performance compared to other initialization methods.

reinforcement-learning Reinforcement Learning (RL)

Symbol Guided Hindsight Priors for Reward Learning from Human Preferences

no code implementations17 Oct 2022 Mudit Verma, Katherine Metcalf

Specifying rewards for reinforcement learned (RL) agents is challenging.

Advice Conformance Verification by Reinforcement Learning agents for Human-in-the-Loop

no code implementations7 Oct 2022 Mudit Verma, Ayush Kharkwal, Subbarao Kambhampati

Through our experiments, we show that our method can provide an interpretable means of solving the Advice-Conformance Verification problem by conveying whether or not the agent is using the human's advice.

Decision Making reinforcement-learning +1

Symbols as a Lingua Franca for Bridging Human-AI Chasm for Explainable and Advisable AI Systems

no code implementations21 Sep 2021 Subbarao Kambhampati, Sarath Sreedharan, Mudit Verma, Yantian Zha, Lin Guan

The jury is still out on whether AI systems will need to use symbols in their internal reasoning to achieve general intelligence capabilities.

Computing Policies That Account For The Effects Of Human Agent Uncertainty During Execution In Markov Decision Processes

1 code implementation15 Sep 2021 Sriram Gopalakrishnan, Mudit Verma, Subbarao Kambhampati

We present a framework to model the human agent's behavior with respect to state uncertainty, and can be used to compute MDP policies that accounts for these problems.

Trust-Aware Planning: Modeling Trust Evolution in Iterated Human-Robot Interaction

no code implementations3 May 2021 Zahra Zahedi, Mudit Verma, Sarath Sreedharan, Subbarao Kambhampati

The problem of trust management is particularly challenging in mixed human-robot teams where the human and the robot may have different models about the task at hand and thus may have different expectations regarding the current course of action, thereby forcing the robot to focus on the costly explicable behavior.

Management

A Novel Framework for Neural Architecture Search in the Hill Climbing Domain

no code implementations22 Feb 2021 Mudit Verma, Pradyumna Sinha, Karan Goyal, Apoorva Verma, Seba Susan

Neural networks have now long been used for solving complex problems of image domain, yet designing the same needs manual expertise.

Neural Architecture Search Reinforcement Learning (RL)

Fine-grained Language Identification with Multilingual CapsNet Model

no code implementations12 Jul 2020 Mudit Verma, Arun Balaji Buduru

Hence, there is an increasing need for real-time and fine-grained content analysis services, including language identification, content transcription, and analysis.

Language Identification

Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation

1 code implementation NeurIPS 2021 Lin Guan, Mudit Verma, Sihang Guo, Ruohan Zhang, Subbarao Kambhampati

We focus on the task of learning from feedback, in which the human trainer not only gives binary evaluative "good" or "bad" feedback for queried state-action pairs, but also provides a visual explanation by annotating relevant features in images.

Atari Games Data Augmentation +3

Bridging the Gap: Providing Post-Hoc Symbolic Explanations for Sequential Decision-Making Problems with Inscrutable Representations

no code implementations ICLR 2022 Sarath Sreedharan, Utkarsh Soni, Mudit Verma, Siddharth Srivastava, Subbarao Kambhampati

As increasingly complex AI systems are introduced into our daily lives, it becomes important for such systems to be capable of explaining the rationale for their decisions and allowing users to contest these decisions.

Decision Making Montezuma's Revenge

Making Smart Homes Smarter: Optimizing Energy Consumption with Human in the Loop

no code implementations6 Dec 2019 Mudit Verma, Siddhant Bhambri, Saurabh Gupta, Arun Balaji Buduru

Rapid advancements in the Internet of Things (IoT) have facilitated more efficient deployment of smart environment solutions for specific user requirement.

Clustering Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.