Methods > Reinforcement Learning > Policy Gradient Methods

Soft Actor Critic, or SAC, is an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy. That is, to succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. SAC combines off-policy updates with a stable stochastic actor-critic formulation.

The SAC objective has a number of advantages. First, the policy is incentivized to explore more widely, while giving up on clearly unpromising avenues. Second, the policy can capture multiple modes of near-optimal behavior. In problem settings where multiple actions seem equally attractive, the policy will commit equal probability mass to those actions. Lastly, the authors present evidence that it improves learning speed over state-of-art methods that optimize the conventional RL objective function.

Source: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Latest Papers

PAPER DATE
Stay Alive with Many Options: A Reinforcement Learning Approach for Autonomous Navigation
Ambedkar DukkipatiRajarshi BanerjeeRanga Shaarad AyyagariDhaval Parmar Udaybhai
2021-01-30
Factored Action Spaces in Deep Reinforcement Learning
Anonymous
2021-01-01
Learning from Simulation, Racing in Reality
Eugenio ChisariAlexander LinigerAlisa RupenyanLuc van GoolJohn Lygeros
2020-11-26
Deep RL With Information Constrained Policies: Generalization in Continuous Control
Tyler MalloyChris R. SimsTim KlingerMiao LiuMatthew RiemerGerald Tesauro
2020-10-09
EMaQ: Expected-Max Q-Learning Operator for Simple Yet Effective Offline and Online RL
Seyed Kamyar Seyed GhasemipourDale SchuurmansShixiang Shane Gu
2020-07-21
Developing cooperative policies for multi-stage tasks
Jordan ErskineChris Lehnert
2020-07-01
Experience Replay with Likelihood-free Importance Weights
Samarth SinhaJiaming SongAnimesh GargStefano Ermon
2020-06-23
Generalized State-Dependent Exploration for Deep Reinforcement Learning in Robotics
| Antonin RaffinFreek Stulp
2020-05-12
DSAC: Distributional Soft Actor Critic for Risk-Sensitive Reinforcement Learning
Xiaoteng MaLi XiaZhengyuan ZhouJun YangQianchuan Zhao
2020-04-30
Adaptive Experience Selection for Policy Gradient
Saad MohamadGiovanni Montana
2020-02-17
Cooperative Highway Work Zone Merge Control based on Reinforcement Learning in A Connected and Automated Environment
Tianzhu RenYuanchang XieLiming Jiang
2020-01-21
Discriminator Soft Actor Critic without Extrinsic Rewards
| Daichi NishioDaiki KuyoshiToi TsunedaSatoshi Yamane
2020-01-19
SLM Lab: A Comprehensive Benchmark and Modular Software Framework for Reproducible Deep Reinforcement Learning
| Keng Wah LoonLaura GraesserMilan Cvitkovic
2019-12-28
Policy Optimization Reinforcement Learning with Entropy Regularization
Jingbin LiuXinyang GuShuai Liu
2019-12-02
Better Exploration with Optimistic Actor Critic
| Kamil CiosekQuan VuongRobert LoftinKatja Hofmann
2019-12-01
Data-efficient Co-Adaptation of Morphology and Behaviour with Deep Reinforcement Learning
Kevin Sebastian LuckHeni Ben AmorRoberto Calandra
2019-11-15
Better Exploration with Optimistic Actor-Critic
Kamil CiosekQuan VuongRobert LoftinKatja Hofmann
2019-10-28
Striving for Simplicity and Performance in Off-Policy DRL: Output Normalization and Non-Uniform Sampling
| Che WangYanqiu WuQuan VuongKeith Ross
2019-10-05
Modified Actor-Critics
Erinc MerdivanSten HankeMatthieu Geist
2019-07-02
Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies
| Patrick Nadeem WardAriella SmofskyAvishek Joey Bose
2019-06-06
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
| Kurtland ChuaRoberto CalandraRowan McallisterSergey Levine
2018-05-30
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
| Tuomas HaarnojaAurick ZhouPieter AbbeelSergey Levine
2018-01-04

Tasks

TASK PAPERS SHARE
Continuous Control 10 41.67%
OpenAI Gym 3 12.50%
Decision Making 2 8.33%
Efficient Exploration 2 8.33%
Autonomous Navigation 1 4.17%
Hierarchical Reinforcement Learning 1 4.17%
Dota 2 1 4.17%
Starcraft 1 4.17%
Offline RL 1 4.17%

Categories