1 code implementation • 6 Feb 2024 • Yash Shukla, Tanushree Burman, Abhishek Kulkarni, Robert Wright, Alvaro Velasquez, Jivko Sinapov
In this work, we propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS), that learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification, while minimizing the number of environmental interactions.
no code implementations • 2 Jan 2024 • Lorenzo Venturini, Samuel Budd, Alfonso Farruggia, Robert Wright, Jacqueline Matthew, Thomas G. Day, Bernhard Kainz, Reza Razavi, Jo V. Hajnal
We use a Bayesian method to estimate the true value of each biometric from a large number of measurements and probabilistically reject outliers.
no code implementations • 14 Oct 2023 • Yash Shukla, Wenchang Gao, Vasanth Sarathy, Alvaro Velasquez, Robert Wright, Jivko Sinapov
In this work, we propose LgTS (LLM-guided Teacher-Student learning), a novel approach that explores the planning abilities of LLMs to provide a graphical representation of the sub-goals to a reinforcement learning (RL) agent that does not have access to the transition dynamics of the environment.
1 code implementation • 13 Oct 2023 • Yash Shukla, Bharat Kesari, Shivam Goel, Robert Wright, Jivko Sinapov
We use Generative Adversarial Networks (GANs) along with a cycle-consistency loss to map the observations between the source and target domains and later use this learned mapping to clone the successful source task behavior policy to the target domain.
no code implementations • 12 Oct 2023 • Geigh Zollicoffer, Kenneth Eaton, Jonathan Balloch, Julia Kim, Mark O. Riedl, Robert Wright
We refer to the sudden change in visual properties or state transitions as novelties.
1 code implementation • 11 Apr 2023 • Yash Shukla, Abhishek Kulkarni, Robert Wright, Alvaro Velasquez, Jivko Sinapov
Experiments in gridworld and physics-based simulated robotics domains show that the curricula produced by AGCL achieve improved time-to-threshold performance on a complex sequential decision-making problem relative to state-of-the-art curriculum learning (e. g, teacher-student, self-play) and automaton-guided reinforcement learning baselines (e. g, Q-Learning for Reward Machines).
no code implementations • 16 Jan 2023 • Jonathan Balloch, Zhiyu Lin, Robert Wright, Xiangyu Peng, Mustafa Hussain, Aarun Srinivas, Julia Kim, Mark O. Riedl
Additionally, WorldCloner augments the policy learning process using imagination-based adaptation, where the world model simulates transitions of the post-novelty environment to help the policy adapt.
1 code implementation • 29 Jun 2022 • Veronika A. Zimmer, Alberto Gomez, Emily Skelton, Robert Wright, Gavin Wheeler, Shujie Deng, Nooshin Ghavami, Karen Lloyd, Jacqueline Matthew, Bernhard Kainz, Daniel Rueckert, Joseph V. Hajnal, Julia A. Schnabel
Automatic segmentation of the placenta in fetal ultrasound (US) is challenging due to the (i) high diversity of placenta appearance, (ii) the restricted quality in US resulting in highly variable reference annotations, and (iii) the limited field-of-view of US prohibiting whole placenta assessment at late gestation.
no code implementations • 23 Mar 2022 • Jonathan Balloch, Zhiyu Lin, Mustafa Hussain, Aarun Srinivas, Robert Wright, Xiangyu Peng, Julia Kim, Mark Riedl
We provide an ontology of for novelties most relevant to sequential decision making, which distinguishes between novelties that affect objects versus actions, unary properties versus non-unary relations, and the distribution of solutions to a task.
no code implementations • 22 Feb 2018 • Andrew Cohen, Lei Yu, Robert Wright
We study an important yet under-addressed problem of quickly and safely improving policies in online reinforcement learning domains.
1 code implementation • 30 Nov 2017 • Joseph Antonelli, Maitreyi Mazumdar, David Bellinger, David C. Christiani, Robert Wright, Brent A. Coull
To estimate the health effects of complex mixtures we propose a flexible Bayesian approach that allows exposures to interact with each other and have nonlinear relationships with the outcome.
Methodology