Search Results for author: Andrey Kolobov

Found 19 papers, 9 papers with code

PRISE: Learning Temporal Action Abstractions as a Sequence Compression Problem

1 code implementation16 Feb 2024 Ruijie Zheng, Ching-An Cheng, Hal Daumé III, Furong Huang, Andrey Kolobov

To do so, we bring a subtle but critical component of LLM training pipelines -- input tokenization via byte pair encoding (BPE) -- to the seemingly distant task of learning skills of variable time span in continuous control domains.

Continuous Control Few-Shot Imitation Learning +2

WindSeer: Real-time volumetric wind prediction over complex terrain aboard a small UAV

no code implementations18 Jan 2024 Florian Achermann, Thomas Stastny, Bogdan Danciu, Andrey Kolobov, Jen Jen Chung, Roland Siegwart, Nicholas Lawrance

Real-time high-resolution wind predictions are beneficial for various applications including safe manned and unmanned aviation.

valid

LLF-Bench: Benchmark for Interactive Learning from Language Feedback

no code implementations11 Dec 2023 Ching-An Cheng, Andrey Kolobov, Dipendra Misra, Allen Nie, Adith Swaminathan

We introduce a new benchmark, LLF-Bench (Learning from Language Feedback Benchmark; pronounced as "elf-bench"), to evaluate the ability of AI agents to interactively learn from natural language feedback and instructions.

Information Retrieval OpenAI Gym

Interactive Robot Learning from Verbal Correction

no code implementations26 Oct 2023 Huihan Liu, Alice Chen, Yuke Zhu, Adith Swaminathan, Andrey Kolobov, Ching-An Cheng

A key feature of OLAF is its ability to update the robot's visuomotor neural policy based on the verbal feedback to avoid repeating mistakes in the future.

Language Modelling Large Language Model

Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control

no code implementations30 Jun 2023 Vivek Myers, Andre He, Kuan Fang, Homer Walke, Philippe Hansen-Estruch, Ching-An Cheng, Mihai Jalobeanu, Andrey Kolobov, Anca Dragan, Sergey Levine

Our method achieves robust performance in the real world by learning an embedding from the labeled data that aligns language not to the goal image, but rather to the desired change between the start and goal images that the instruction corresponds to.

Instruction Following

Improving Offline RL by Blending Heuristics

no code implementations1 Jun 2023 Sinong Geng, Aldo Pacchiano, Andrey Kolobov, Ching-An Cheng

We propose Heuristic Blending (HUBL), a simple performance-improving technique for a broad class of offline RL algorithms based on value bootstrapping.

D4RL Offline RL

PLEX: Making the Most of the Available Data for Robotic Manipulation Pretraining

no code implementations15 Mar 2023 Garrett Thomas, Ching-An Cheng, Ricky Loynd, Felipe Vieira Frujeri, Vibhav Vineet, Mihai Jalobeanu, Andrey Kolobov

A rich representation is key to general robotic manipulation, but existing approaches to representation learning require large amounts of multimodal demonstrations.

Representation Learning

MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control

1 code implementation15 Aug 2022 Nolan Wagener, Andrey Kolobov, Felipe Vieira Frujeri, Ricky Loynd, Ching-An Cheng, Matthew Hausknecht

We demonstrate the utility of MoCapAct by using it to train a single hierarchical policy capable of tracking the entire MoCap dataset within dm_control and show the learned low-level component can be re-used to efficiently learn downstream high-level tasks.

Humanoid Control

The Sandbox Environment for Generalizable Agent Research (SEGAR)

1 code implementation19 Mar 2022 R Devon Hjelm, Bogdan Mazoure, Florian Golemo, Felipe Frujeri, Mihai Jalobeanu, Andrey Kolobov

A broad challenge of research on generalization for sequential decision-making tasks in interactive environments is designing benchmarks that clearly landmark progress.

Decision Making

Cross-Trajectory Representation Learning for Zero-Shot Generalization in RL

1 code implementation ICLR 2022 Bogdan Mazoure, Ahmed M. Ahmed, Patrick MacAlpine, R Devon Hjelm, Andrey Kolobov

A highly desirable property of a reinforcement learning (RL) agent -- and a major difficulty for deep RL approaches -- is the ability to generalize policies learned on a few tasks over a high-dimensional observation space to similar tasks not seen during training.

Reinforcement Learning (RL) Representation Learning +1

Policy Improvement via Imitation of Multiple Oracles

no code implementations NeurIPS 2020 Ching-An Cheng, Andrey Kolobov, Alekh Agarwal

In this paper, we propose the state-wise maximum of the oracle policies' values as a natural baseline to resolve conflicting advice from multiple oracles.

Imitation Learning

Online Learning for Active Cache Synchronization

1 code implementation ICML 2020 Andrey Kolobov, Sébastien Bubeck, Julian Zimmert

Existing multi-armed bandit (MAB) models make two implicit assumptions: an arm generates a payoff only when it is played, and the agent observes every payoff that is generated.

Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling

1 code implementation NeurIPS 2019 Andrey Kolobov, Yuval Peres, Cheng Lu, Eric J. Horvitz

From traditional Web search engines to virtual assistants and Web accelerators, services that rely on online information need to continually keep track of remote content changes by explicitly requesting content updates from remote sources (e. g., web pages).

reinforcement-learning Reinforcement Learning (RL) +1

Autonomous Thermalling as a Partially Observable Markov Decision Process (Extended Version)

1 code implementation24 May 2018 Iain Guilliard, Richard Rogahn, Jim Piavis, Andrey Kolobov

Small uninhabited aerial vehicles (sUAVs) commonly rely on active propulsion to stay airborne, which limits flight time and range.

Robotics Systems and Control

Metareasoning for Planning Under Uncertainty

no code implementations3 May 2015 Christopher H. Lin, Andrey Kolobov, Ece Kamar, Eric Horvitz

Our work subsumes previously studied special cases of metareasoning and shows that in the general case, metareasoning is at most polynomially harder than solving MDPs with any given algorithm that disregards the cost of thinking.

Cannot find the paper you are looking for? You can Submit a new open access paper.