Search Results for author: Satinder Singh

Found 76 papers, 22 papers with code

Genie: Generative Interactive Environments

no code implementations • 23 Feb 2024 • Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal Behbahani, Stephanie Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando de Freitas, Satinder Singh, Tim Rocktäschel

We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos.

Paper
Add Code

Diversifying AI: Towards Creative Chess with AlphaZero

no code implementations • 17 Aug 2023 • Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, Satinder Singh

In particular, we investigate whether a team of diverse AI systems can outperform a single AI in challenging tasks by generating more ideas as a group and then selecting the best ones.

Decision Making Game of Chess

Paper
Add Code

A Definition of Continual Reinforcement Learning

no code implementations • NeurIPS 2023 • David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh

Using this new language, we define a continual learning agent as one that can be understood as carrying out an implicit search process indefinitely, and continual reinforcement learning as the setting in which the best agents are all continual learning agents.

Continual Learning reinforcement-learning

Paper
Add Code

On the Convergence of Bounded Agents

no code implementations • 20 Jul 2023 • David Abel, André Barreto, Hado van Hasselt, Benjamin Van Roy, Doina Precup, Satinder Singh

Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence: An agent converges when its behavior or performance in each environment state stops changing.

reinforcement-learning

Paper
Add Code

Structured State Space Models for In-Context Reinforcement Learning

2 code implementations • NeurIPS 2023 • Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob Foerster, Satinder Singh, Feryal Behbahani

We propose a modification to a variant of S4 that enables us to initialise and reset the hidden state in parallel, allowing us to tackle reinforcement learning tasks.

Continuous Control Meta-Learning +1

Paper
Code

Hierarchical Reinforcement Learning in Complex 3D Environments

no code implementations • 28 Feb 2023 • Bernardo Avila Pires, Feryal Behbahani, Hubert Soyer, Kyriacos Nikiforou, Thomas Keck, Satinder Singh

Hierarchical Reinforcement Learning (HRL) agents have the potential to demonstrate appealing capabilities such as planning and exploration with abstraction, transfer, and skill reuse.

Hierarchical Reinforcement Learning reinforcement-learning +1

Paper
Add Code

ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs

no code implementations • 2 Feb 2023 • Ted Moskovitz, Brendan O'Donoghue, Vivek Veeriah, Sebastian Flennerhag, Satinder Singh, Tom Zahavy

Such applications often require to put constraints on the agent's behavior.

Continuous Control reinforcement-learning +1

Paper
Add Code

Composing Task Knowledge with Modular Successor Feature Approximators

1 code implementation • 28 Jan 2023 • Wilka Carvalho, Angelos Filos, Richard L. Lewis, Honglak Lee, Satinder Singh

Recently, the Successor Features and Generalized Policy Improvement (SF&GPI) framework has been proposed as a method for learning, composing, and transferring predictive knowledge and behavior.

448

Paper
Code

POMRL: No-Regret Learning-to-Plan with Increasing Horizons

no code implementations • 30 Dec 2022 • Khimya Khetarpal, Claire Vernade, Brendan O'Donoghue, Satinder Singh, Tom Zahavy

We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task.

Meta Reinforcement Learning Reinforcement Learning (RL)

Paper
Add Code

Discovering Evolution Strategies via Meta-Black-Box Optimization

1 code implementation • 21 Nov 2022 • Robert Tjarko Lange, Tom Schaul, Yutian Chen, Tom Zahavy, Valentin Dallibard, Chris Lu, Satinder Singh, Sebastian Flennerhag

Optimizing functions without access to gradients is the remit of black-box methods such as evolution strategies.

Continuous Control Meta-Learning

446

Paper
Code

Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction

no code implementations • 30 Oct 2022 • Dilip Arumugam, Satinder Singh

The Bayes-Adaptive Markov Decision Process (BAMDP) formalism pursues the Bayes-optimal solution to the exploration-exploitation trade-off in reinforcement learning.

Efficient Exploration reinforcement-learning +1

Paper
Add Code

In-context Reinforcement Learning with Algorithm Distillation

1 code implementation • 25 Oct 2022 • Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Hansen, Angelos Filos, Ethan Brooks, Maxime Gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih

We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model.

reinforcement-learning

Paper
Code

Palm up: Playing in the Latent Manifold for Unsupervised Pretraining

no code implementations • 19 Oct 2022 • Hao liu, Tom Zahavy, Volodymyr Mnih, Satinder Singh

In this work, we aim to bring the best of both worlds and propose an algorithm that exhibits an exploratory behavior whilst it utilizes large diverse datasets.

reinforcement-learning Reinforcement Learning (RL) +2

Paper
Add Code

Meta-Gradients in Non-Stationary Environments

no code implementations • 13 Sep 2022 • Jelena Luketina, Sebastian Flennerhag, Yannick Schroecker, David Abel, Tom Zahavy, Satinder Singh

We support these results with a qualitative analysis of resulting meta-parameter schedules and learned functions of context features.

Paper
Add Code

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

1 code implementation • 30 Jun 2022 • Julien Perolat, Bart De Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen Mcaleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot, Shayegan Omidshafiei, Edward Lockhart, Laurent SIfre, Nathalie Beauguerlange, Remi Munos, David Silver, Satinder Singh, Demis Hassabis, Karl Tuyls

It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes).

Board Games Decision Making +2

3,995

Paper
Code

Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality

no code implementations • 26 May 2022 • Tom Zahavy, Yannick Schroecker, Feryal Behbahani, Kate Baumli, Sebastian Flennerhag, Shaobo Hou, Satinder Singh

Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations.

Paper
Add Code

GrASP: Gradient-Based Affordance Selection for Planning

no code implementations • 8 Feb 2022 • Vivek Veeriah, Zeyu Zheng, Richard Lewis, Satinder Singh

Our empirical work shows that it is feasible to learn to select both primitive-action and option affordances, and that simultaneously learning to select affordances and planning with a learned value-equivalent model can outperform model-free RL.

Reinforcement Learning (RL)

Paper
Add Code

On the Expressivity of Markov Reward

no code implementations • NeurIPS 2021 • David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, Satinder Singh

We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists.

Paper
Add Code

Learning to Learn End-to-End Goal-Oriented Dialog From Related Dialog Tasks

no code implementations • EMNLP (NLP4ConvAI) 2021 • Janarthanan Rajendran, Jonathan K. Kummerfeld, Satinder Singh

For each goal-oriented dialog task of interest, large amounts of data need to be collected for end-to-end learning of a neural dialog system.

Goal-Oriented Dialog Meta-Learning

Paper
Add Code

Bootstrapped Meta-Learning

1 code implementation • ICLR 2022 • Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh

We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark and demonstrate that it yields both performance and efficiency gains in multi-task meta-learning.

Efficient Exploration Few-Shot Learning +1

Paper
Code

Proper Value Equivalence

1 code implementation • NeurIPS 2021 • Christopher Grimm, André Barreto, Gregory Farquhar, David Silver, Satinder Singh

The value-equivalence (VE) principle proposes a simple answer to this question: a model should capture the aspects of the environment that are relevant for value-based planning.

Model-based Reinforcement Learning Reinforcement Learning (RL)

Paper
Code

Discovering Diverse Nearly Optimal Policies with Successor Features

no code implementations • ICML Workshop URL 2021 • Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh

We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while assuring that they are near optimal.

Paper
Add Code

Reward is enough for convex MDPs

no code implementations • NeurIPS 2021 • Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh

Maximising a cumulative reward function that is Markov and stationary, i. e., defined over state-action pairs and independent of time, is sufficient to capture many kinds of goals in a Markov decision process (MDP).

Reinforcement Learning (RL)

Paper
Add Code

Reinforcement Learning of Implicit and Explicit Control Flow in Instructions

no code implementations • 25 Feb 2021 • Ethan A. Brooks, Janarthanan Rajendran, Richard L. Lewis, Satinder Singh

Learning to flexibly follow task instructions in dynamic environments poses interesting challenges for reinforcement learning agents.

reinforcement-learning Reinforcement Learning (RL) +2

Paper
Add Code

Discovery of Options via Meta-Learned Subgoals

no code implementations • NeurIPS 2021 • Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, Satinder Singh

Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster.

Reinforcement Learning (RL)

Paper
Add Code

Adaptive Pairwise Weights for Temporal Credit Assignment

no code implementations • 9 Feb 2021 • Zeyu Zheng, Risto Vuorio, Richard Lewis, Satinder Singh

In this empirical paper, we explore heuristics based on more general pairwise weightings that are functions of the state in which the action was taken, the state at the time of the reward, as well as the time interval between the two.

Reinforcement Learning (RL)

Paper
Add Code

Learning State Representations from Random Deep Action-conditional Predictions

1 code implementation • NeurIPS 2021 • Zeyu Zheng, Vivek Veeriah, Risto Vuorio, Richard Lewis, Satinder Singh

Our main contribution in this work is an empirical finding that random General Value Functions (GVFs), i. e., deep action-conditional predictions -- random both in what feature of observations they predict as well as in the sequence of actions the predictions are conditioned upon -- form good auxiliary tasks for reinforcement learning (RL) problems.

Atari Games Reinforcement Learning (RL) +2

Paper
Code

Discovering a set of policies for the worst case reward

no code implementations • ICLR 2021 • Tom Zahavy, Andre Barreto, Daniel J Mankowitz, Shaobo Hou, Brendan O'Donoghue, Iurii Kemaev, Satinder Singh

Our main contribution is a policy iteration algorithm that builds a set of policies in order to maximize the worst-case performance of the resulting SMP on the set of tasks.

Paper
Add Code

Efficient Querying for Cooperative Probabilistic Commitments

no code implementations • 14 Dec 2020 • Qi Zhang, Edmund H. Durfee, Satinder Singh

Multiagent systems can use commitments as the core of a general coordination infrastructure, supporting both cooperative and non-cooperative interactions.

Paper
Add Code

On Efficiency in Hierarchical Reinforcement Learning

no code implementations • NeurIPS 2020 • Zheng Wen, Doina Precup, Morteza Ibrahimi, Andre Barreto, Benjamin Van Roy, Satinder Singh

Hierarchical Reinforcement Learning (HRL) approaches promise to provide more efficient solutions to sequential decision making problems, both in terms of statistical as well as computational efficiency.

Computational Efficiency Decision Making +4

Paper
Add Code

The Value Equivalence Principle for Model-Based Reinforcement Learning

no code implementations • NeurIPS 2020 • Christopher Grimm, André Barreto, Satinder Singh, David Silver

As our main contribution, we introduce the principle of value equivalence: two models are value equivalent with respect to a set of functions and policies if they yield the same Bellman updates.

Model-based Reinforcement Learning reinforcement-learning +2

Paper
Add Code

Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a First-person Simulated 3D Environment

no code implementations • 28 Oct 2020 • Wilka Carvalho, Anthony Liang, Kimin Lee, Sungryull Sohn, Honglak Lee, Richard L. Lewis, Satinder Singh

In this work, we show that one can learn object-interaction tasks from scratch without supervision by learning an attentive object-model as an auxiliary task during task learning with an object-centric relational RL agent.

Object Reinforcement Learning (RL) +1

Paper
Add Code

Discovering Reinforcement Learning Algorithms

1 code implementation • NeurIPS 2020 • Junhyuk Oh, Matteo Hessel, Wojciech M. Czarnecki, Zhongwen Xu, Hado van Hasselt, Satinder Singh, David Silver

Automating the discovery of update rules from data could lead to more efficient algorithms, or algorithms that are better adapted to specific environments.

Atari Games Meta-Learning +3

Paper
Code

Meta-Gradient Reinforcement Learning with an Objective Discovered Online

no code implementations • NeurIPS 2020 • Zhongwen Xu, Hado van Hasselt, Matteo Hessel, Junhyuk Oh, Satinder Singh, David Silver

In this work, we propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network, solely from interactive experience with its environment.

Q-Learning reinforcement-learning +1

Paper
Add Code

Learning to Play No-Press Diplomacy with Best Response Policy Iteration

2 code implementations • NeurIPS 2020 • Thomas Anthony, Tom Eccles, Andrea Tacchetti, János Kramár, Ian Gemp, Thomas C. Hudson, Nicolas Porcel, Marc Lanctot, Julien Pérolat, Richard Everett, Roman Werpachowski, Satinder Singh, Thore Graepel, Yoram Bachrach

It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms.

Reinforcement Learning (RL) Starcraft

Paper
Code

A Self-Tuning Actor-Critic Algorithm

no code implementations • NeurIPS 2020 • Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh

Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters, typically requiring significant manual effort to identify hyperparameters that perform well on a new domain.

Atari Games reinforcement-learning +1

Paper
Add Code

How Should an Agent Practice?

no code implementations • 15 Dec 2019 • Janarthanan Rajendran, Richard Lewis, Vivek Veeriah, Honglak Lee, Satinder Singh

We present a method for learning intrinsic reward functions to drive the learning of an agent during periods of practice in which extrinsic task rewards are not available.

Paper
Add Code

What Can Learned Intrinsic Rewards Capture?

no code implementations • ICML 2020 • Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado van Hasselt, David Silver, Satinder Singh

Furthermore, we show that unlike policy transfer methods that capture "how" the agent should behave, the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment by capturing "what" the agent should strive to do.

Paper
Add Code

Hindsight Credit Assignment

1 code implementation • NeurIPS 2019 • Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Azar, Bilal Piot, Nicolas Heess, Hado van Hasselt, Greg Wayne, Satinder Singh, Doina Precup, Remi Munos

We consider the problem of efficient credit assignment in reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

No-Press Diplomacy: Modeling Multi-Agent Gameplay

no code implementations • NeurIPS 2019 • Philip Paquette, Yuchen Lu, Seton Steven Bocco, Max Smith, Satya O.-G., Jonathan K. Kummerfeld, Joelle Pineau, Satinder Singh, Aaron C. Courville

Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal.

Reinforcement Learning (RL)

Paper
Add Code

Deep Reinforcement Learning for Multi-Driver Vehicle Dispatching and Repositioning Problem

no code implementations • 25 Nov 2019 • John Holler, Risto Vuorio, Zhiwei Qin, Xiaocheng Tang, Yan Jiao, Tiancheng Jin, Satinder Singh, Chenxi Wang, Jieping Ye

Order dispatching and driver repositioning (also known as fleet management) in the face of spatially and temporally varying supply and demand are central to a ride-sharing platform marketplace.

BIG-bench Machine Learning Decision Making +3

Paper
Add Code

Disentangled Cumulants Help Successor Representations Transfer to New Tasks

no code implementations • 25 Nov 2019 • Christopher Grimm, Irina Higgins, Andre Barreto, Denis Teplyashin, Markus Wulfmeier, Tim Hertweck, Raia Hadsell, Satinder Singh

This is in contrast to the state-of-the-art reinforcement learning agents, which typically start learning each new task from scratch and struggle with knowledge transfer.

Transfer Learning

Paper
Add Code

Object-oriented state editing for HRL

no code implementations • 31 Oct 2019 • Victor Bapst, Alvaro Sanchez-Gonzalez, Omar Shams, Kimberly Stachenfeld, Peter W. Battaglia, Satinder Singh, Jessica B. Hamrick

We introduce agents that use object-oriented reasoning to consider alternate states of the world in order to more quickly find solutions to problems.

Object

Paper
Add Code

Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles

no code implementations • 23 Oct 2019 • Aditya Modi, Nan Jiang, Ambuj Tewari, Satinder Singh

As an extension, we also consider the more challenging problem of model selection, where the state features are unknown and can be chosen from a large candidate set.

Model Selection reinforcement-learning +1

Paper
Add Code

Multiagent Reinforcement Learning in Games with an Iterated Dominance Solution

no code implementations • 25 Sep 2019 • Yoram Bachrach, Tor Lattimore, Marta Garnelo, Julien Perolat, David Balduzzi, Thomas Anthony, Satinder Singh, Thore Graepel

We show that MARL converges to the desired outcome if the rewards are designed so that exerting effort is the iterated dominance solution, but fails if it is merely a Nash equilibrium.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Discovery of Useful Questions as Auxiliary Tasks

no code implementations • NeurIPS 2019 • Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Richard Lewis, Janarthanan Rajendran, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh

Arguably, intelligent agents ought to be able to discover their own questions so that in learning answers for them they learn unanticipated useful knowledge and skills; this departs from the focus in much of machine learning on agents learning answers to externally defined questions.

Reinforcement Learning (RL)

Paper
Add Code

No Press Diplomacy: Modeling Multi-Agent Gameplay

1 code implementation • 4 Sep 2019 • Philip Paquette, Yuchen Lu, Steven Bocco, Max O. Smith, Satya Ortiz-Gagne, Jonathan K. Kummerfeld, Satinder Singh, Joelle Pineau, Aaron Courville

Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal.

Reinforcement Learning (RL)

Paper
Code

Behaviour Suite for Reinforcement Learning

2 code implementations • ICLR 2020 • Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado van Hasselt

bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives.

reinforcement-learning Reinforcement Learning (RL)

1,465

Paper
Code

Learning Independently-Obtainable Reward Functions

no code implementations • 24 Jan 2019 • Christopher Grimm, Satinder Singh

We present a novel method for learning a set of disentangled reward functions that sum to the original environment reward and are constrained to be independently obtainable.

Paper
Add Code

Generative Adversarial Self-Imitation Learning

no code implementations • ICLR 2019 • Yijie Guo, Junhyuk Oh, Satinder Singh, Honglak Lee

This paper explores a simple regularizer for reinforcement learning by proposing Generative Adversarial Self-Imitation Learning (GASIL), which encourages the agent to imitate past good trajectories via generative adversarial imitation learning framework.

Imitation Learning reinforcement-learning +1

Paper
Add Code

Completing State Representations using Spectral Learning

no code implementations • NeurIPS 2018 • Nan Jiang, Alex Kulesza, Satinder Singh

A central problem in dynamical system modeling is state discovery—that is, finding a compact summary of the past that captures the information needed to predict the future.

Paper
Add Code

Learning End-to-End Goal-Oriented Dialog with Multiple Answers

1 code implementation • EMNLP 2018 • Janarthanan Rajendran, Jatin Ganhotra, Satinder Singh, Lazaros Polymenakos

We also propose a new and more effective testbed, permuted-bAbI dialog tasks, by introducing multiple valid next utterances to the original-bAbI dialog tasks, which allows evaluation of goal-oriented dialog systems in a more realistic setting.

Goal-Oriented Dialog valid

Paper
Code

Many-Goals Reinforcement Learning

no code implementations • 22 Jun 2018 • Vivek Veeriah, Junhyuk Oh, Satinder Singh

Second, we explore whether many-goals updating can be used to pre-train a network to subsequently learn faster and better on a single main task of interest.

Q-Learning reinforcement-learning +1

Paper
Add Code

Self-Imitation Learning

4 code implementations • ICML 2018 • Junhyuk Oh, Yijie Guo, Satinder Singh, Honglak Lee

This paper proposes Self-Imitation Learning (SIL), a simple off-policy actor-critic algorithm that learns to reproduce the agent's past good decisions.

Ranked #3 on Atari Games on Atari 2600 Atlantis

Atari Games Imitation Learning

271

Paper
Code

NE-Table: A Neural key-value table for Named Entities

1 code implementation • RANLP 2019 • Janarthanan Rajendran, Jatin Ganhotra, Xiaoxiao Guo, Mo Yu, Satinder Singh, Lazaros Polymenakos

Many Natural Language Processing (NLP) tasks depend on using Named Entities (NEs) that are contained in texts and in external knowledge sources.

Goal-Oriented Dialog Question Answering +2

Paper
Code

On Learning Intrinsic Rewards for Policy Gradient Methods

1 code implementation • NeurIPS 2018 • Zeyu Zheng, Junhyuk Oh, Satinder Singh

In this paper we derive a novel algorithm for learning intrinsic rewards for policy-gradient based learning agents.

Atari Games Decision Making +1

Paper
Code

The Advantage of Doubling: A Deep Reinforcement Learning Approach to Studying the Double Team in the NBA

1 code implementation • 8 Mar 2018 • Jiaxuan Wang, Ian Fox, Jonathan Skaza, Nick Linck, Satinder Singh, Jenna Wiens

During the 2017 NBA playoffs, Celtics coach Brad Stevens was faced with a difficult decision when defending against the Cavaliers: "Do you double and risk giving up easy shots, or stay at home and do the best you can?"

Paper
Code

A Neural Method for Goal-Oriented Dialog Systems to interact with Named Entities

no code implementations • ICLR 2018 • Janarthanan Rajendran, Jatin Ganhotra, Xiaoxiao Guo, Mo Yu, Satinder Singh

Many goal-oriented dialog tasks, especially ones in which the dialog system has to interact with external knowledge sources such as databases, have to handle a large number of Named Entities (NEs).

Goal-Oriented Dialog Question Answering

Paper
Add Code

Markov Decision Processes with Continuous Side Information

no code implementations • 15 Nov 2017 • Aditya Modi, Nan Jiang, Satinder Singh, Ambuj Tewari

Because our lower bound has an exponential dependence on the dimension, we consider a tractable linear setting where the context is used to create linear combinations of a finite set of MDPs.

PAC learning Reinforcement Learning (RL)

Paper
Add Code

Value Prediction Network

2 code implementations • NeurIPS 2017 • Junhyuk Oh, Satinder Singh, Honglak Lee

This paper proposes a novel deep reinforcement learning (RL) architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network.

Ranked #9 on Atari Games on Atari 2600 Krull

Atari Games Reinforcement Learning (RL) +1

164

Paper
Code

Understanding and Predicting Empathic Behavior in Counseling Therapy

no code implementations • ACL 2017 • Ver{\'o}nica P{\'e}rez-Rosas, Rada Mihalcea, Kenneth Resnicow, Satinder Singh, Lawrence An

Counselor empathy is associated with better outcomes in psychology and behavioral counseling.

Paper
Add Code

Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning

1 code implementation • ICML 2017 • Junhyuk Oh, Satinder Singh, Honglak Lee, Pushmeet Kohli

As a step towards developing zero-shot task generalization capabilities in reinforcement learning (RL), we introduce a new RL problem where the agent should learn to execute sequences of instructions after learning useful skills that solve subtasks.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Repeated Inverse Reinforcement Learning

no code implementations • NeurIPS 2017 • Kareem Amin, Nan Jiang, Satinder Singh

We introduce a novel repeated Inverse Reinforcement Learning problem: the agent has to act on behalf of a human in a sequence of tasks and wishes to minimize the number of tasks that it surprises the human by acting suboptimally with respect to how the human would have acted.

Imitation Learning reinforcement-learning +1

Paper
Add Code

Predicting Counselor Behaviors in Motivational Interviewing Encounters

no code implementations • EACL 2017 • Ver{\'o}nica P{\'e}rez-Rosas, Rada Mihalcea, Kenneth Resnicow, Satinder Singh, Lawrence An, Kathy J. Goggin, Delwyn Catley

As the number of people receiving psycho-therapeutic treatment increases, the automatic evaluation of counseling practice arises as an important challenge in the clinical domain.

Paper
Add Code

Minimizing Maximum Regret in Commitment Constrained Sequential Decision Making

no code implementations • 14 Mar 2017 • Qi Zhang, Satinder Singh, Edmund Durfee

In cooperative multiagent planning, it can often be beneficial for an agent to make commitments about aspects of its behavior to others, allowing them in turn to plan their own behaviors without taking the agent's detailed behavior into account.

Decision Making

Paper
Add Code

Building a Motivational Interviewing Dataset

no code implementations • WS 2016 • Ver{\'o}nica P{\'e}rez-Rosas, Rada Mihalcea, Kenneth Resnicow, Satinder Singh, Lawrence An

Paper
Add Code

Control of Memory, Active Perception, and Action in Minecraft

no code implementations • 30 May 2016 • Junhyuk Oh, Valliappa Chockalingam, Satinder Singh, Honglak Lee

In this paper, we introduce a new set of reinforcement learning (RL) tasks in Minecraft (a flexible 3D world).

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games

no code implementations • 24 Apr 2016 • Xiaoxiao Guo, Satinder Singh, Richard Lewis, Honglak Lee

We present an adaptation of PGRD (policy-gradient for reward-design) for learning a reward-bonus function to improve UCT (a MCTS algorithm).

Atari Games Decision Making

Paper
Add Code

Towards Resolving Unidentifiability in Inverse Reinforcement Learning

no code implementations • 25 Jan 2016 • Kareem Amin, Satinder Singh

We first demonstrate that if the learner can experiment with any transition dynamics on some fixed set of states and actions, then there exists an algorithm that reconstructs the agent's reward function to the fullest extent theoretically possible, and that requires only a small (logarithmic) number of experiments.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Action-Conditional Video Prediction using Deep Networks in Atari Games

1 code implementation • NeurIPS 2015 • Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard Lewis, Satinder Singh

Motivated by vision-based reinforcement learning (RL) problems, in particular Atari games from the recent benchmark Aracade Learning Environment (ALE), we consider spatio-temporal prediction problems where future (image-)frames are dependent on control variables or actions as well as previous frames.

Atari Games Reinforcement Learning (RL) +1

Paper
Code

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

no code implementations • NeurIPS 2014 • Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L. Lewis, Xiaoshi Wang

The combination of modern Reinforcement Learning and Deep Learning approaches holds the promise of making significant progress on challenging applications requiring both rich perception and policy-selection.

Atari Games reinforcement-learning +1

Paper
Add Code

Computationally Rational Saccadic Control: An Explanation of Spillover Effects Based on Sampling from Noisy Perception and Memory

no code implementations • WS 2014 • Michael Shvartsman, Richard Lewis, Satinder Singh

Paper
Add Code

Learning to Make Predictions In Partially Observable Environments Without a Generative Model

no code implementations • 16 Jan 2014 • Erik Talvitie, Satinder Singh

We formalize the problem of learning a prediction profile model as a transformation of the original model-learning problem, and show empirically that one can learn prediction profile models that make a small set of important predictions even in systems that are too complex for standard generative models.

Paper
Add Code

Reward Mapping for Transfer in Long-Lived Agents

no code implementations • NeurIPS 2013 • Xiaoxiao Guo, Satinder Singh, Richard L. Lewis

We demonstrate that our approach can substantially improve the agent's performance relative to other approaches, including an approach that transfers policies.

Paper
Add Code

Graphical Models for Game Theory

no code implementations • 10 Jan 2013 • Michael Kearns, Michael L. Littman, Satinder Singh

The interpretation is that the payoff to player i is determined entirely by the actions of player i and his neighbors in the graph, and thus the payoff matrix to player i is indexed only by these players.

Paper
Add Code

Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning

1 code implementation • Artificial Intelligence 1999 • Richard S. Sutton, Doina Precup, Satinder Singh

In particular, we show that options may be used interchangeably with primitive actions in planning methods such as dynamic programming and in learning methods such as Q-learning.

Q-Learning reinforcement-learning

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.