Continuous Control

413 papers with code • 73 benchmarks • 9 datasets

Continuous control in the context of playing games, especially within artificial intelligence (AI) and machine learning (ML), refers to the ability to make a series of smooth, ongoing adjustments or actions to control a game or a simulation. This is in contrast to discrete control, where the actions are limited to a set of specific, distinct choices. Continuous control is crucial in environments where precision, timing, and the magnitude of actions matter, such as driving a car in a racing game, controlling a character in a simulation, or managing the flight of an aircraft in a flight simulator.

Benchmarks

Add a Result

These leaderboards are used to track progress in Continuous Control

Dataset	Best Model	Compare
PyBullet HalfCheetah	SAC	See all
PyBullet Walker2D	SAC gSDE	See all
PyBullet Ant	SAC gSDE	See all
PyBullet Hopper	SAC gSDE	See all
Lunar Lander (OpenAI Gym)	MAC	See all
DeepMind Cheetah Run (Images)	DreamerV1	See all
DeepMind Cup Catch (Images)	DrQ	See all
DeepMind Walker Walk (Images)	DrQ	See all
cartpole.swingup	SMuZero	See all
cheetah.run	SMuZero	See all
finger.turn_hard	SMuZero	See all
walker.stand	SMuZero	See all
walker.walk	SMuZero	See all
Cart-Pole Balancing	TRPO	See all
Inverted Pendulum	TRPO	See all
Mountain Car	TRPO	See all
Acrobot	TRPO	See all
Double Inverted Pendulum	TRPO	See all
Swimmer	TRPO	See all
Hopper	TRPO	See all
2D Walker	TRPO	See all
Half-Cheetah	TRPO	See all
Ant	TRPO	See all
Simple Humanoid	TRPO	See all
Full Humanoid	TRPO	See all
Cart-Pole Balancing (limited sensors)	TRPO	See all
Inverted Pendulum (limited sensors)	TRPO	See all
Mountain Car (limited sensors)	TRPO	See all
Acrobot (limited sensors)	TRPO	See all
Cart-Pole Balancing (noisy observations)	TRPO	See all
Inverted Pendulum (noisy observations)	TRPO	See all
Mountain Car (noisy observations)	TRPO	See all
Acrobot (noisy observations)	TRPO	See all
Cart-Pole Balancing (system identifications)	TRPO	See all
Inverted Pendulum (system identifications)	TRPO	See all
Mountain Car (system identifications)	TRPO	See all
Acrobot (system identifications)	TRPO	See all
Swimmer + Gathering	TRPO	See all
Ant + Gathering	TRPO	See all
Swimmer + Maze	TRPO	See all
Ant + Maze	TRPO	See all
Cart Pole (OpenAI Gym)	MAC	See all
Finger, spin (DMControl500k)	CURL	See all
Cartpole, swingup (DMControl500k)	CURL	See all
Reacher, easy (DMControl500k)	CURL	See all
Cheetah, run (DMControl500k)	CURL	See all
Walker, walk (DMControl500k)	CURL	See all
Ball in cup, catch (DMControl500k)	CURL	See all
Finger, spin (DMControl100k)	CURL	See all
Cartpole, swingup (DMControl100k)	CURL	See all
Reacher, easy (DMControl100k)	CURL	See all
Cheetah, run (DMControl100k)	CURL	See all
Walker, walk (DMControl100k)	CURL	See all
Ball in cup, catch (DMControl100k)	CURL	See all
acrobot.swingup	SMuZero	See all
cartpole.balance	SMuZero	See all
cartpole.balance_sparse	SMuZero	See all
cartpole.swingup_sparse	SMuZero	See all
ball_in_cup.catch	SMuZero	See all
finger.spin	SMuZero	See all
finger.turn_easy	SMuZero	See all
hopper.hop	SMuZero	See all
hopper.stand	SMuZero	See all
pendulum.swingup	SMuZero	See all
quadruped.run	SMuZero	See all
quadruped.walk	SMuZero	See all
reacher.easy	SMuZero	See all
reacher.hard	SMuZero	See all
walker.run	SMuZero	See all
fish.swim	MuZero Unplugged	See all
manipulator.insert_ball	MuZero Unplugged	See all
manipulator.insert_peg	MuZero Unplugged	See all
humanoid.run	MuZero Unplugged	See all

Show all 73 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Continuous Control models and implementations

DLR-RM/stable-baselines3

8 papers

7,945

hill-a/stable-baselines

7 papers

4,042

opendilab/DI-engine

7 papers

2,555

Kaixhin/imitation-learning

6 papers

386

See all 33 libraries.

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

World Models via Policy-Guided Trajectory Diffusion

marc-rigter/polygrad-world-models • • 13 Dec 2023

Our results demonstrate that PolyGRAD outperforms state-of-the-art baselines in terms of trajectory prediction error for short trajectories, with the exception of autoregressive diffusion.

13 Dec 2023

Paper
Code

Decoupling Meta-Reinforcement Learning with Gaussian Task Contexts and Skills

hehongc/DCMRL • • 11 Dec 2023

We propose a framework called decoupled meta-reinforcement learning (DCMRL), which (1) contrastively restricts the learning of task contexts through pulling in similar task contexts within the same task and pushing away different task contexts of different tasks, and (2) utilizes a Gaussian quantization variational autoencoder (GQ-VAE) for clustering the Gaussian distributions of the task contexts and skills respectively, and decoupling the exploration and learning processes of their spaces.

11 Dec 2023

Paper
Code

DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization

XuGW-Kevin/DrM • • 30 Oct 2023

To quantify this inactivity, we adopt dormant ratio as a metric to measure inactivity in the RL agent's network.

30 Oct 2023

Paper
Code

TD-MPC2: Scalable, Robust World Models for Continuous Control

nicklashansen/tdmpc2 • • 25 Oct 2023

TD-MPC is a model-based reinforcement learning (RL) algorithm that performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model.

205

25 Oct 2023

Paper
Code

Absolute Policy Optimization

intelligent-control-lab/absolute-policy-optimization • • 20 Oct 2023

In recent years, trust region on-policy reinforcement learning has achieved impressive results in addressing complex control tasks and gaming scenarios.

20 Oct 2023

Paper
Code

Reduced Policy Optimization for Continuous Control with Hard Constraints

wadx2019/rpo • • NeurIPS 2023

To the best of our knowledge, RPO is the first attempt that introduces GRG to RL as a way of efficiently handling both equality and inequality hard constraints.

14 Oct 2023

Paper
Code

Boosting Continuous Control with Consistency Policy

cccedric/cpql • • 10 Oct 2023

By establishing a mapping from the reverse diffusion trajectories to the desired policy, we simultaneously address the issues of time efficiency and inaccurate guidance when updating diffusion model-based policy with the learned Q-function.

10 Oct 2023

Paper
Code

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

nathanrahn/return-landscapes • • NeurIPS 2023

To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy.

26 Sep 2023

Paper
Code

Learning Shared Safety Constraints from Multi-task Demonstrations

konwook/mticl • • NeurIPS 2023

Regardless of the particular task we want them to perform in an environment, there are often shared safety constraints we want our agents to respect.

01 Sep 2023

Paper
Code

Stabilizing Unsupervised Environment Design with a Learned Adversary

facebookresearch/dcd • • 21 Aug 2023

As a result, we make it possible for PAIRED to match or exceed state-of-the-art methods, producing robust agents in several established challenging procedurally-generated environments, including a partially-observed maze navigation task and a continuous-control car racing environment.

110

21 Aug 2023

Paper
Code

Continuous Control

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result