Search Results for author: Satinder Singh

Found 76 papers, 22 papers with code

Diversifying AI: Towards Creative Chess with AlphaZero

no code implementations17 Aug 2023 Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, Satinder Singh

In particular, we investigate whether a team of diverse AI systems can outperform a single AI in challenging tasks by generating more ideas as a group and then selecting the best ones.

Decision Making Game of Chess

A Definition of Continual Reinforcement Learning

no code implementations NeurIPS 2023 David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh

Using this new language, we define a continual learning agent as one that can be understood as carrying out an implicit search process indefinitely, and continual reinforcement learning as the setting in which the best agents are all continual learning agents.

Continual Learning reinforcement-learning

On the Convergence of Bounded Agents

no code implementations20 Jul 2023 David Abel, André Barreto, Hado van Hasselt, Benjamin Van Roy, Doina Precup, Satinder Singh

Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence: An agent converges when its behavior or performance in each environment state stops changing.

reinforcement-learning

Structured State Space Models for In-Context Reinforcement Learning

2 code implementations NeurIPS 2023 Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob Foerster, Satinder Singh, Feryal Behbahani

We propose a modification to a variant of S4 that enables us to initialise and reset the hidden state in parallel, allowing us to tackle reinforcement learning tasks.

Continuous Control Meta-Learning +1

Hierarchical Reinforcement Learning in Complex 3D Environments

no code implementations28 Feb 2023 Bernardo Avila Pires, Feryal Behbahani, Hubert Soyer, Kyriacos Nikiforou, Thomas Keck, Satinder Singh

Hierarchical Reinforcement Learning (HRL) agents have the potential to demonstrate appealing capabilities such as planning and exploration with abstraction, transfer, and skill reuse.

Hierarchical Reinforcement Learning reinforcement-learning +1

Composing Task Knowledge with Modular Successor Feature Approximators

1 code implementation28 Jan 2023 Wilka Carvalho, Angelos Filos, Richard L. Lewis, Honglak Lee, Satinder Singh

Recently, the Successor Features and Generalized Policy Improvement (SF&GPI) framework has been proposed as a method for learning, composing, and transferring predictive knowledge and behavior.

POMRL: No-Regret Learning-to-Plan with Increasing Horizons

no code implementations30 Dec 2022 Khimya Khetarpal, Claire Vernade, Brendan O'Donoghue, Satinder Singh, Tom Zahavy

We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task.

Meta Reinforcement Learning Reinforcement Learning (RL)

Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction

no code implementations30 Oct 2022 Dilip Arumugam, Satinder Singh

The Bayes-Adaptive Markov Decision Process (BAMDP) formalism pursues the Bayes-optimal solution to the exploration-exploitation trade-off in reinforcement learning.

Efficient Exploration reinforcement-learning +1

In-context Reinforcement Learning with Algorithm Distillation

1 code implementation25 Oct 2022 Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Hansen, Angelos Filos, Ethan Brooks, Maxime Gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih

We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model.

reinforcement-learning

Palm up: Playing in the Latent Manifold for Unsupervised Pretraining

no code implementations19 Oct 2022 Hao liu, Tom Zahavy, Volodymyr Mnih, Satinder Singh

In this work, we aim to bring the best of both worlds and propose an algorithm that exhibits an exploratory behavior whilst it utilizes large diverse datasets.

reinforcement-learning Reinforcement Learning (RL) +2

Meta-Gradients in Non-Stationary Environments

no code implementations13 Sep 2022 Jelena Luketina, Sebastian Flennerhag, Yannick Schroecker, David Abel, Tom Zahavy, Satinder Singh

We support these results with a qualitative analysis of resulting meta-parameter schedules and learned functions of context features.

Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality

no code implementations26 May 2022 Tom Zahavy, Yannick Schroecker, Feryal Behbahani, Kate Baumli, Sebastian Flennerhag, Shaobo Hou, Satinder Singh

Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations.

GrASP: Gradient-Based Affordance Selection for Planning

no code implementations8 Feb 2022 Vivek Veeriah, Zeyu Zheng, Richard Lewis, Satinder Singh

Our empirical work shows that it is feasible to learn to select both primitive-action and option affordances, and that simultaneously learning to select affordances and planning with a learned value-equivalent model can outperform model-free RL.

Reinforcement Learning (RL)

On the Expressivity of Markov Reward

no code implementations NeurIPS 2021 David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, Satinder Singh

We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists.

Bootstrapped Meta-Learning

1 code implementation ICLR 2022 Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh

We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark and demonstrate that it yields both performance and efficiency gains in multi-task meta-learning.

Efficient Exploration Few-Shot Learning +1

Proper Value Equivalence

1 code implementation NeurIPS 2021 Christopher Grimm, André Barreto, Gregory Farquhar, David Silver, Satinder Singh

The value-equivalence (VE) principle proposes a simple answer to this question: a model should capture the aspects of the environment that are relevant for value-based planning.

Model-based Reinforcement Learning Reinforcement Learning (RL)

Discovering Diverse Nearly Optimal Policies with Successor Features

no code implementations ICML Workshop URL 2021 Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh

We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while assuring that they are near optimal.

Reward is enough for convex MDPs

no code implementations NeurIPS 2021 Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh

Maximising a cumulative reward function that is Markov and stationary, i. e., defined over state-action pairs and independent of time, is sufficient to capture many kinds of goals in a Markov decision process (MDP).

Reinforcement Learning (RL)

Reinforcement Learning of Implicit and Explicit Control Flow in Instructions

no code implementations25 Feb 2021 Ethan A. Brooks, Janarthanan Rajendran, Richard L. Lewis, Satinder Singh

Learning to flexibly follow task instructions in dynamic environments poses interesting challenges for reinforcement learning agents.

reinforcement-learning Reinforcement Learning (RL) +2

Adaptive Pairwise Weights for Temporal Credit Assignment

no code implementations9 Feb 2021 Zeyu Zheng, Risto Vuorio, Richard Lewis, Satinder Singh

In this empirical paper, we explore heuristics based on more general pairwise weightings that are functions of the state in which the action was taken, the state at the time of the reward, as well as the time interval between the two.

Reinforcement Learning (RL)

Learning State Representations from Random Deep Action-conditional Predictions

1 code implementation NeurIPS 2021 Zeyu Zheng, Vivek Veeriah, Risto Vuorio, Richard Lewis, Satinder Singh

Our main contribution in this work is an empirical finding that random General Value Functions (GVFs), i. e., deep action-conditional predictions -- random both in what feature of observations they predict as well as in the sequence of actions the predictions are conditioned upon -- form good auxiliary tasks for reinforcement learning (RL) problems.

Atari Games Reinforcement Learning (RL) +2

Discovering a set of policies for the worst case reward

no code implementations ICLR 2021 Tom Zahavy, Andre Barreto, Daniel J Mankowitz, Shaobo Hou, Brendan O'Donoghue, Iurii Kemaev, Satinder Singh

Our main contribution is a policy iteration algorithm that builds a set of policies in order to maximize the worst-case performance of the resulting SMP on the set of tasks.

Efficient Querying for Cooperative Probabilistic Commitments

no code implementations14 Dec 2020 Qi Zhang, Edmund H. Durfee, Satinder Singh

Multiagent systems can use commitments as the core of a general coordination infrastructure, supporting both cooperative and non-cooperative interactions.

On Efficiency in Hierarchical Reinforcement Learning

no code implementations NeurIPS 2020 Zheng Wen, Doina Precup, Morteza Ibrahimi, Andre Barreto, Benjamin Van Roy, Satinder Singh

Hierarchical Reinforcement Learning (HRL) approaches promise to provide more efficient solutions to sequential decision making problems, both in terms of statistical as well as computational efficiency.

Computational Efficiency Decision Making +4

The Value Equivalence Principle for Model-Based Reinforcement Learning

no code implementations NeurIPS 2020 Christopher Grimm, André Barreto, Satinder Singh, David Silver

As our main contribution, we introduce the principle of value equivalence: two models are value equivalent with respect to a set of functions and policies if they yield the same Bellman updates.

Model-based Reinforcement Learning reinforcement-learning +2

Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a First-person Simulated 3D Environment

no code implementations28 Oct 2020 Wilka Carvalho, Anthony Liang, Kimin Lee, Sungryull Sohn, Honglak Lee, Richard L. Lewis, Satinder Singh

In this work, we show that one can learn object-interaction tasks from scratch without supervision by learning an attentive object-model as an auxiliary task during task learning with an object-centric relational RL agent.

Object Reinforcement Learning (RL) +1

Discovering Reinforcement Learning Algorithms

1 code implementation NeurIPS 2020 Junhyuk Oh, Matteo Hessel, Wojciech M. Czarnecki, Zhongwen Xu, Hado van Hasselt, Satinder Singh, David Silver

Automating the discovery of update rules from data could lead to more efficient algorithms, or algorithms that are better adapted to specific environments.

Atari Games Meta-Learning +3

Meta-Gradient Reinforcement Learning with an Objective Discovered Online

no code implementations NeurIPS 2020 Zhongwen Xu, Hado van Hasselt, Matteo Hessel, Junhyuk Oh, Satinder Singh, David Silver

In this work, we propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network, solely from interactive experience with its environment.

Q-Learning reinforcement-learning +1

A Self-Tuning Actor-Critic Algorithm

no code implementations NeurIPS 2020 Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh

Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters, typically requiring significant manual effort to identify hyperparameters that perform well on a new domain.

Atari Games reinforcement-learning +1

How Should an Agent Practice?

no code implementations15 Dec 2019 Janarthanan Rajendran, Richard Lewis, Vivek Veeriah, Honglak Lee, Satinder Singh

We present a method for learning intrinsic reward functions to drive the learning of an agent during periods of practice in which extrinsic task rewards are not available.

What Can Learned Intrinsic Rewards Capture?

no code implementations ICML 2020 Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado van Hasselt, David Silver, Satinder Singh

Furthermore, we show that unlike policy transfer methods that capture "how" the agent should behave, the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment by capturing "what" the agent should strive to do.

Deep Reinforcement Learning for Multi-Driver Vehicle Dispatching and Repositioning Problem

no code implementations25 Nov 2019 John Holler, Risto Vuorio, Zhiwei Qin, Xiaocheng Tang, Yan Jiao, Tiancheng Jin, Satinder Singh, Chenxi Wang, Jieping Ye

Order dispatching and driver repositioning (also known as fleet management) in the face of spatially and temporally varying supply and demand are central to a ride-sharing platform marketplace.

BIG-bench Machine Learning Decision Making +3

Disentangled Cumulants Help Successor Representations Transfer to New Tasks

no code implementations25 Nov 2019 Christopher Grimm, Irina Higgins, Andre Barreto, Denis Teplyashin, Markus Wulfmeier, Tim Hertweck, Raia Hadsell, Satinder Singh

This is in contrast to the state-of-the-art reinforcement learning agents, which typically start learning each new task from scratch and struggle with knowledge transfer.

Transfer Learning

Object-oriented state editing for HRL

no code implementations31 Oct 2019 Victor Bapst, Alvaro Sanchez-Gonzalez, Omar Shams, Kimberly Stachenfeld, Peter W. Battaglia, Satinder Singh, Jessica B. Hamrick

We introduce agents that use object-oriented reasoning to consider alternate states of the world in order to more quickly find solutions to problems.

Object

Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles

no code implementations23 Oct 2019 Aditya Modi, Nan Jiang, Ambuj Tewari, Satinder Singh

As an extension, we also consider the more challenging problem of model selection, where the state features are unknown and can be chosen from a large candidate set.

Model Selection reinforcement-learning +1

Multiagent Reinforcement Learning in Games with an Iterated Dominance Solution

no code implementations25 Sep 2019 Yoram Bachrach, Tor Lattimore, Marta Garnelo, Julien Perolat, David Balduzzi, Thomas Anthony, Satinder Singh, Thore Graepel

We show that MARL converges to the desired outcome if the rewards are designed so that exerting effort is the iterated dominance solution, but fails if it is merely a Nash equilibrium.

reinforcement-learning Reinforcement Learning (RL)

Discovery of Useful Questions as Auxiliary Tasks

no code implementations NeurIPS 2019 Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Richard Lewis, Janarthanan Rajendran, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh

Arguably, intelligent agents ought to be able to discover their own questions so that in learning answers for them they learn unanticipated useful knowledge and skills; this departs from the focus in much of machine learning on agents learning answers to externally defined questions.

Reinforcement Learning (RL)

No Press Diplomacy: Modeling Multi-Agent Gameplay

1 code implementation4 Sep 2019 Philip Paquette, Yuchen Lu, Steven Bocco, Max O. Smith, Satya Ortiz-Gagne, Jonathan K. Kummerfeld, Satinder Singh, Joelle Pineau, Aaron Courville

Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal.

Reinforcement Learning (RL)

Learning Independently-Obtainable Reward Functions

no code implementations24 Jan 2019 Christopher Grimm, Satinder Singh

We present a novel method for learning a set of disentangled reward functions that sum to the original environment reward and are constrained to be independently obtainable.

Generative Adversarial Self-Imitation Learning

no code implementations ICLR 2019 Yijie Guo, Junhyuk Oh, Satinder Singh, Honglak Lee

This paper explores a simple regularizer for reinforcement learning by proposing Generative Adversarial Self-Imitation Learning (GASIL), which encourages the agent to imitate past good trajectories via generative adversarial imitation learning framework.

Imitation Learning reinforcement-learning +1

Completing State Representations using Spectral Learning

no code implementations NeurIPS 2018 Nan Jiang, Alex Kulesza, Satinder Singh

A central problem in dynamical system modeling is state discovery—that is, finding a compact summary of the past that captures the information needed to predict the future.

Learning End-to-End Goal-Oriented Dialog with Multiple Answers

1 code implementation EMNLP 2018 Janarthanan Rajendran, Jatin Ganhotra, Satinder Singh, Lazaros Polymenakos

We also propose a new and more effective testbed, permuted-bAbI dialog tasks, by introducing multiple valid next utterances to the original-bAbI dialog tasks, which allows evaluation of goal-oriented dialog systems in a more realistic setting.

Goal-Oriented Dialog valid

Many-Goals Reinforcement Learning

no code implementations22 Jun 2018 Vivek Veeriah, Junhyuk Oh, Satinder Singh

Second, we explore whether many-goals updating can be used to pre-train a network to subsequently learn faster and better on a single main task of interest.

Q-Learning reinforcement-learning +1

Self-Imitation Learning

4 code implementations ICML 2018 Junhyuk Oh, Yijie Guo, Satinder Singh, Honglak Lee

This paper proposes Self-Imitation Learning (SIL), a simple off-policy actor-critic algorithm that learns to reproduce the agent's past good decisions.

Atari Games Imitation Learning

NE-Table: A Neural key-value table for Named Entities

1 code implementation RANLP 2019 Janarthanan Rajendran, Jatin Ganhotra, Xiaoxiao Guo, Mo Yu, Satinder Singh, Lazaros Polymenakos

Many Natural Language Processing (NLP) tasks depend on using Named Entities (NEs) that are contained in texts and in external knowledge sources.

Goal-Oriented Dialog Question Answering +2

On Learning Intrinsic Rewards for Policy Gradient Methods

1 code implementation NeurIPS 2018 Zeyu Zheng, Junhyuk Oh, Satinder Singh

In this paper we derive a novel algorithm for learning intrinsic rewards for policy-gradient based learning agents.

Atari Games Decision Making +1

The Advantage of Doubling: A Deep Reinforcement Learning Approach to Studying the Double Team in the NBA

1 code implementation8 Mar 2018 Jiaxuan Wang, Ian Fox, Jonathan Skaza, Nick Linck, Satinder Singh, Jenna Wiens

During the 2017 NBA playoffs, Celtics coach Brad Stevens was faced with a difficult decision when defending against the Cavaliers: "Do you double and risk giving up easy shots, or stay at home and do the best you can?"

A Neural Method for Goal-Oriented Dialog Systems to interact with Named Entities

no code implementations ICLR 2018 Janarthanan Rajendran, Jatin Ganhotra, Xiaoxiao Guo, Mo Yu, Satinder Singh

Many goal-oriented dialog tasks, especially ones in which the dialog system has to interact with external knowledge sources such as databases, have to handle a large number of Named Entities (NEs).

Goal-Oriented Dialog Question Answering

Markov Decision Processes with Continuous Side Information

no code implementations15 Nov 2017 Aditya Modi, Nan Jiang, Satinder Singh, Ambuj Tewari

Because our lower bound has an exponential dependence on the dimension, we consider a tractable linear setting where the context is used to create linear combinations of a finite set of MDPs.

PAC learning Reinforcement Learning (RL)

Value Prediction Network

2 code implementations NeurIPS 2017 Junhyuk Oh, Satinder Singh, Honglak Lee

This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network.

Atari Games Reinforcement Learning (RL) +1

Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning

1 code implementation ICML 2017 Junhyuk Oh, Satinder Singh, Honglak Lee, Pushmeet Kohli

As a step towards developing zero-shot task generalization capabilities in reinforcement learning (RL), we introduce a new RL problem where the agent should learn to execute sequences of instructions after learning useful skills that solve subtasks.

reinforcement-learning Reinforcement Learning (RL)

Repeated Inverse Reinforcement Learning

no code implementations NeurIPS 2017 Kareem Amin, Nan Jiang, Satinder Singh

We introduce a novel repeated Inverse Reinforcement Learning problem: the agent has to act on behalf of a human in a sequence of tasks and wishes to minimize the number of tasks that it surprises the human by acting suboptimally with respect to how the human would have acted.

Imitation Learning reinforcement-learning +1

Predicting Counselor Behaviors in Motivational Interviewing Encounters

no code implementations EACL 2017 Ver{\'o}nica P{\'e}rez-Rosas, Rada Mihalcea, Kenneth Resnicow, Satinder Singh, Lawrence An, Kathy J. Goggin, Delwyn Catley

As the number of people receiving psycho-therapeutic treatment increases, the automatic evaluation of counseling practice arises as an important challenge in the clinical domain.

Minimizing Maximum Regret in Commitment Constrained Sequential Decision Making

no code implementations14 Mar 2017 Qi Zhang, Satinder Singh, Edmund Durfee

In cooperative multiagent planning, it can often be beneficial for an agent to make commitments about aspects of its behavior to others, allowing them in turn to plan their own behaviors without taking the agent's detailed behavior into account.

Decision Making

Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games

no code implementations24 Apr 2016 Xiaoxiao Guo, Satinder Singh, Richard Lewis, Honglak Lee

We present an adaptation of PGRD (policy-gradient for reward-design) for learning a reward-bonus function to improve UCT (a MCTS algorithm).

Atari Games Decision Making

Towards Resolving Unidentifiability in Inverse Reinforcement Learning

no code implementations25 Jan 2016 Kareem Amin, Satinder Singh

We first demonstrate that if the learner can experiment with any transition dynamics on some fixed set of states and actions, then there exists an algorithm that reconstructs the agent's reward function to the fullest extent theoretically possible, and that requires only a small (logarithmic) number of experiments.

reinforcement-learning Reinforcement Learning (RL)

Action-Conditional Video Prediction using Deep Networks in Atari Games

1 code implementation NeurIPS 2015 Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard Lewis, Satinder Singh

Motivated by vision-based reinforcement learning (RL) problems, in particular Atari games from the recent benchmark Aracade Learning Environment (ALE), we consider spatio-temporal prediction problems where future (image-)frames are dependent on control variables or actions as well as previous frames.

Atari Games Reinforcement Learning (RL) +1

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

no code implementations NeurIPS 2014 Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L. Lewis, Xiaoshi Wang

The combination of modern Reinforcement Learning and Deep Learning approaches holds the promise of making significant progress on challenging applications requiring both rich perception and policy-selection.

Atari Games reinforcement-learning +1

Learning to Make Predictions In Partially Observable Environments Without a Generative Model

no code implementations16 Jan 2014 Erik Talvitie, Satinder Singh

We formalize the problem of learning a prediction profile model as a transformation of the original model-learning problem, and show empirically that one can learn prediction profile models that make a small set of important predictions even in systems that are too complex for standard generative models.

Reward Mapping for Transfer in Long-Lived Agents

no code implementations NeurIPS 2013 Xiaoxiao Guo, Satinder Singh, Richard L. Lewis

We demonstrate that our approach can substantially improve the agent's performance relative to other approaches, including an approach that transfers policies.

Graphical Models for Game Theory

no code implementations10 Jan 2013 Michael Kearns, Michael L. Littman, Satinder Singh

The interpretation is that the payoff to player i is determined entirely by the actions of player i and his neighbors in the graph, and thus the payoff matrix to player i is indexed only by these players.

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

1 code implementation Artificial Intelligence 1999 Richard S. Sutton, Doina Precup, Satinder Singh

In particular, we show that options may be used interchangeably with primitive actions in planning methods such as dynamic programming and in learning methods such as Q-learning.

Q-Learning reinforcement-learning

Cannot find the paper you are looking for? You can Submit a new open access paper.