Continuous Control

422 papers with code • 73 benchmarks • 10 datasets

Continuous control in the context of playing games, especially within artificial intelligence (AI) and machine learning (ML), refers to the ability to make a series of smooth, ongoing adjustments or actions to control a game or a simulation. This is in contrast to discrete control, where the actions are limited to a set of specific, distinct choices. Continuous control is crucial in environments where precision, timing, and the magnitude of actions matter, such as driving a car in a racing game, controlling a character in a simulation, or managing the flight of an aircraft in a flight simulator.

Benchmarks

Add a Result

These leaderboards are used to track progress in Continuous Control

Dataset	Best Model	Compare
PyBullet HalfCheetah	SAC	See all
PyBullet Walker2D	SAC gSDE	See all
PyBullet Ant	SAC gSDE	See all
PyBullet Hopper	SAC gSDE	See all
Lunar Lander (OpenAI Gym)	MAC	See all
DeepMind Cheetah Run (Images)	DreamerV1	See all
DeepMind Cup Catch (Images)	DrQ	See all
DeepMind Walker Walk (Images)	DrQ	See all
cartpole.swingup	SMuZero	See all
cheetah.run	SMuZero	See all
finger.turn_hard	SMuZero	See all
walker.stand	SMuZero	See all
walker.walk	SMuZero	See all
Cart-Pole Balancing	TRPO	See all
Inverted Pendulum	TRPO	See all
Mountain Car	TRPO	See all
Acrobot	TRPO	See all
Double Inverted Pendulum	TRPO	See all
Swimmer	TRPO	See all
Hopper	TRPO	See all
2D Walker	TRPO	See all
Half-Cheetah	TRPO	See all
Ant	TRPO	See all
Simple Humanoid	TRPO	See all
Full Humanoid	TRPO	See all
Cart-Pole Balancing (limited sensors)	TRPO	See all
Inverted Pendulum (limited sensors)	TRPO	See all
Mountain Car (limited sensors)	TRPO	See all
Acrobot (limited sensors)	TRPO	See all
Cart-Pole Balancing (noisy observations)	TRPO	See all
Inverted Pendulum (noisy observations)	TRPO	See all
Mountain Car (noisy observations)	TRPO	See all
Acrobot (noisy observations)	TRPO	See all
Cart-Pole Balancing (system identifications)	TRPO	See all
Inverted Pendulum (system identifications)	TRPO	See all
Mountain Car (system identifications)	TRPO	See all
Acrobot (system identifications)	TRPO	See all
Swimmer + Gathering	TRPO	See all
Ant + Gathering	TRPO	See all
Swimmer + Maze	TRPO	See all
Ant + Maze	TRPO	See all
Cart Pole (OpenAI Gym)	MAC	See all
Finger, spin (DMControl500k)	CURL	See all
Cartpole, swingup (DMControl500k)	CURL	See all
Reacher, easy (DMControl500k)	CURL	See all
Cheetah, run (DMControl500k)	CURL	See all
Walker, walk (DMControl500k)	CURL	See all
Ball in cup, catch (DMControl500k)	CURL	See all
Finger, spin (DMControl100k)	CURL	See all
Cartpole, swingup (DMControl100k)	CURL	See all
Reacher, easy (DMControl100k)	CURL	See all
Cheetah, run (DMControl100k)	CURL	See all
Walker, walk (DMControl100k)	CURL	See all
Ball in cup, catch (DMControl100k)	CURL	See all
acrobot.swingup	SMuZero	See all
cartpole.balance	SMuZero	See all
cartpole.balance_sparse	SMuZero	See all
cartpole.swingup_sparse	SMuZero	See all
ball_in_cup.catch	SMuZero	See all
finger.spin	SMuZero	See all
finger.turn_easy	SMuZero	See all
hopper.hop	SMuZero	See all
hopper.stand	SMuZero	See all
pendulum.swingup	SMuZero	See all
quadruped.run	SMuZero	See all
quadruped.walk	SMuZero	See all
reacher.easy	SMuZero	See all
reacher.hard	SMuZero	See all
walker.run	SMuZero	See all
fish.swim	MuZero Unplugged	See all
manipulator.insert_ball	MuZero Unplugged	See all
manipulator.insert_peg	MuZero Unplugged	See all
humanoid.run	MuZero Unplugged	See all

Show all 73 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Continuous Control models and implementations

DLR-RM/stable-baselines3

8 papers

8,159

hill-a/stable-baselines

7 papers

4,069

opendilab/DI-engine

7 papers

2,656

Kaixhin/imitation-learning

6 papers

394

See all 33 libraries.

Datasets

Subtasks

Latest papers with no code

Most implemented Social Latest No code

Probabilistic Actor-Critic: Learning to Explore with PAC-Bayes Uncertainty

no code yet • 5 Feb 2024

We introduce Probabilistic Actor-Critic (PAC), a novel reinforcement learning algorithm with improved continuous control performance thanks to its ability to mitigate the exploration-exploitation trade-off.

Paper
Add Code

Frugal Actor-Critic: Sample Efficient Off-Policy Deep Reinforcement Learning Using Unique Experiences

no code yet • 5 Feb 2024

Efficient utilization of the replay buffer plays a significant role in the off-policy actor-critic reinforcement learning (RL) algorithms used for model-free control policy synthesis for complex dynamical systems.

Paper
Add Code

A Strategy for Preparing Quantum Squeezed States Using Reinforcement Learning

no code yet • 29 Jan 2024

It is exemplified by the application to prepare spin-squeezed states for an open collective spin model where a linear control field is designed to govern the dynamics.

Paper
Add Code

Pulse Width Modulation Method Applied to Nonlinear Model Predictive Control on an Under-actuated Small Satellite

no code yet • 21 Jan 2024

Among various satellite actuators, magnetic torquers have been widely equipped for stabilization and attitude control of small satellites.

Paper
Add Code

Identifying Policy Gradient Subspaces

no code yet • 12 Jan 2024

Policy gradient methods hold great potential for solving complex continuous control tasks.

Paper
Add Code

The Distributional Reward Critic Architecture for Perturbed-Reward Reinforcement Learning

no code yet • 11 Jan 2024

We study reinforcement learning in the presence of an unknown reward perturbation.

Paper
Add Code

A Minimaximalist Approach to Reinforcement Learning from Human Feedback

no code yet • 8 Jan 2024

Our approach is maximalist in that it provably handles non-Markovian, intransitive, and stochastic preferences while being robust to the compounding errors that plague offline approaches to sequential prediction.

Paper
Add Code

Trajectory-Oriented Policy Optimization with Sparse Rewards

no code yet • 4 Jan 2024

The proposed algorithm undergoes evaluation across extensive discrete and continuous control tasks with sparse and misleading rewards.

Paper
Add Code

Adversarially Trained Actor Critic for offline CMDPs

no code yet • 1 Jan 2024

Theoretically, we demonstrate that when the actor employs a no-regret optimization oracle, SATAC achieves two guarantees: (i) For the first time in the offline RL setting, we establish that SATAC can produce a policy that outperforms the behavior policy while maintaining the same level of safety, which is critical to designing an algorithm for offline RL.

Paper
Add Code

Ensemble-based Interactive Imitation Learning

no code yet • 28 Dec 2023

We study interactive imitation learning, where a learner interactively queries a demonstrating expert for action annotations, aiming to learn a policy that has performance competitive with the expert, using as few annotations as possible.

Paper
Add Code

Continuous Control

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result