RMSProp

RMSProp is an unpublished adaptive learning rate optimizer proposed by Geoff Hinton. The motivation is that the magnitude of gradients can differ for different weights, and can change during learning, making it hard to choose a single global learning rate. RMSProp tackles this by keeping a moving average of the squared gradient and adjusting the weight updates by this magnitude. The gradient updates are performed as:

$$E\left[g^{2}\right]_{t} = 0.9E\left[g^{2}\right]_{t-1} + 0.1g^{2}_{t}$$

$$\theta_{t+1} = \theta_{t} - \frac{\eta}{\sqrt{E\left[g^{2}\right]_{t} + \epsilon}}g_{t}$$

Hinton suggests $\gamma=0.9$, with a good default for $\eta$ as $0.001$.

Image: Alec Radford

Latest Papers

PAPER DATE
Roles and Utilization of Attention Heads in Transformer-based Neural Language Models
Jae-young JoSung-Hyon Myaeng
2020-07-01
Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds
Yingxue ZhouXiangyi ChenMingyi HongZhiwei Steven WuArindam Banerjee
2020-06-24
Adaptive Learning Rates with Maximum Variation Averaging
| Chen ZhuYu ChengZhe GanFurong HuangJingjing LiuTom Goldstein
2020-06-21
Stochastic Optimization with Non-stationary Noise
Jingzhao ZhangHongzhou LinSubhro DasSuvrit SraAli Jadbabaie
2020-06-08
BWCNN: Blink to Word, a Real-Time Convolutional Neural Network Approach
Albara Ah RamliRex LiuRahul KrishnamoorthyVishal I BXiaoxiao WangIlias TagkopoulosXin Liu
2020-06-01
Convergence of Online Adaptive and Recurrent Optimization Algorithms
Pierre-Yves MasséYann Ollivier
2020-05-12
3D Printed Brain-Controlled Robot-Arm Prosthetic via Embedded Deep Learning from sEMG Sensors
David LonsdaleLi ZhangRichard Jiang
2020-05-04
Supervised Contrastive Learning
| Prannay KhoslaPiotr TeterwakChen WangAaron SarnaYonglong TianPhillip IsolaAaron MaschinotCe LiuDilip Krishnan
2020-04-23
Circumventing Outliers of AutoAugment with Knowledge Distillation
Longhui WeiAn XiaoLingxi XieXin ChenXiaopeng ZhangQi Tian
2020-03-25
GreedyNAS: Towards Fast One-Shot NAS with Greedy Supernet
Shan YouTao HuangMingmin YangFei WangChen QianChangshui Zhang
2020-03-25
Advanced Deep Learning Methodologies for Skin Cancer Classification in Prodromal Stages
Muhammad Ali FarooqAsma KhatoonViktor VarkarakisPeter Corcoran
2020-03-13
Towards a General Theory of Infinite-Width Limits of Neural Classifiers
| Eugene A. Golikov
2020-03-12
SynCGAN: Using learnable class specific priors to generate synthetic data for improving classifier performance on cytological images
Soumyajyoti DeySoham DasSwarnendu GhoshShyamali MitraSukanta ChakrabartyNibaran Das
2020-03-12
Improving the Backpropagation Algorithm with Consequentialism Weight Updates over Mini-Batches
Naeem PaeedehKamaledin Ghiasi-Shirazi
2020-03-11
A Self-Tuning Actor-Critic Algorithm
Tom ZahavyZhongwen XuVivek VeeriahMatteo HesselJunhyuk OhHado van HasseltDavid SilverSatinder Singh
2020-02-28
Neuron Shapley: Discovering the Responsible Neurons
Amirata GhorbaniJames Zou
2020-02-23
Introducing Fuzzy Layers for Deep Learning
Stanton R. PriceSteven R. PriceDerek T. Anderson
2020-02-21
Exponential discretization of weights of neural network connections in pre-trained neural networks
Magomed Yu. MalsagovEmil M. KhayrovMaria M. PushkarevaIakov M. Karandashev
2020-02-03
Gradient descent with momentum --- to accelerate or to super-accelerate?
Goran NakerstJohn BrennanMasudul Haque
2020-01-17
Step Size Optimization
Anonymous
2020-01-01
ZeroQ: A Novel Zero Shot Quantization Framework
| Yaohui CaiZhewei YaoZhen DongAmir GholamiMichael W. MahoneyKurt Keutzer
2020-01-01
Parameter Continuation Methods for the Optimization of Deep Neural Networks
| Harsh Nilesh PathankRandy Clinton Paffenroth
2019-12-16
Linear Mode Connectivity and the Lottery Ticket Hypothesis
| Jonathan FrankleGintare Karolina DziugaiteDaniel M. RoyMichael Carbin
2019-12-11
Scratch that! An Evolution-based Adversarial Attack against Neural Networks
Malhar JereLoris RossiBriland HitajGabriela CiocarlieGiacomo BoracchiFarinaz Koushanfar
2019-12-05
IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks
Michael LuoJiahao YaoRichard LiawEric LiangIon Stoica
2019-11-30
Adversarial Examples Improve Image Recognition
| Cihang XieMingxing TanBoqing GongJiang WangAlan YuilleQuoc V. Le
2019-11-21
Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks
| Saurabh SinghShankar Krishnan
2019-11-21
Predictive modeling of brain tumor: A Deep learning approach
Priyansh SaxenaAkshat MaheshwariShivani TayalSaumil Maheshwari
2019-11-06
Establishing an Evaluation Metric to Quantify Climate Change Image Realism
Sharon ZhouAlexandra LuccioniGautier CosneMichael S. BernsteinYoshua Bengio
2019-10-22
Implementation of a modified Nesterov's Accelerated quasi-Newton Method on Tensorflow
S. IndrapriyadarsiniShahrzad MahboubiHiroshi NinomiyaHideki Asai
2019-10-21
TorchBeast: A PyTorch Platform for Distributed RL
| Heinrich KüttlerNantas NardelliThibaut LavrilMarco SelvaticiViswanath SivakumarTim RocktäschelEdward Grefenstette
2019-10-08
RandAugment: Practical automated data augmentation with a reduced search space
| Ekin D. CubukBarret ZophJonathon ShlensQuoc V. Le
2019-09-30
GDP: Generalized Device Placement for Dataflow Graphs
Yanqi ZhouSudip RoyAmirali AbdolrashidiDaniel WongPeter C. MaQiumin XuMing ZhongHanxiao LiuAnna GoldieAzalia MirhoseiniJames Laudon
2019-09-28
diffGrad: An Optimization Method for Convolutional Neural Networks
| Shiv Ram DubeySoumendu ChakrabortySwalpa Kumar RoySnehasis MukherjeeSatish Kumar SinghBidyut Baran Chaudhuri
2019-09-12
Partitioned integrators for thermodynamic parameterization of neural networks
Benedict LeimkuhlerCharles MatthewsTiffany Vlaar
2019-08-30
SCARLET-NAS: Bridging the gap between Stability and Scalability in Weight-sharing Neural Architecture Search
| Xiangxiang ChuBo ZhangJixiang LiQingyuan LiRuijun Xu
2019-08-16
Histographs: Graphs in Histopathology
Shrey GadiyaDeepak AnandAmit Sethi
2019-08-14
On the Variance of the Adaptive Learning Rate and Beyond
| Liyuan LiuHaoming JiangPengcheng HeWeizhu ChenXiaodong LiuJianfeng GaoJiawei Han
2019-08-08
MoGA: Searching Beyond MobileNetV3
| Xiangxiang ChuBo ZhangRuijun Xu
2019-08-04
Genetic Deep Learning for Lung Cancer Screening
Hunter ParkConnor Monahan
2019-07-27
RNN-based Online Handwritten Character Recognition Using Accelerometer and Gyroscope Data
Davit SoseliaShota AmashukeliIrakli KoberidzeLevan Shugliashvili
2019-07-24
Meta-descent for Online, Continual Prediction
Andrew JacobsenMatthew SchlegelCameron LinkeThomas DegrisAdam WhiteMartha White
2019-07-17
DeepDA: LSTM-based Deep Data Association Network for Multi-Targets Tracking in Clutter
Huajun LiuHui ZhangChristoph Mertz
2019-07-16
The Role of Memory in Stochastic Optimization
Antonio OrvietoJonas KohlerAurelien Lucchi
2019-07-02
Mimic and Fool: A Task Agnostic Adversarial Attack
Akshay ChaturvediUtpal Garain
2019-06-11
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
| Mingxing TanQuoc V. Le
2019-05-28
VecHGrad for Solving Accurately Complex Tensor Decomposition
Jeremy CharlierVladimir Makarenkov
2019-05-24
Blockwise Adaptivity: Faster Training and Better Generalization in Deep Learning
Shuai ZhengJames T. Kwok
2019-05-23
Ellipsoidal Trust Region Methods for Neural Network Training
Leonard AdolphsJonas KohlerAurelien Lucchi
2019-05-22
Transfer Learning based Detection of Diabetic Retinopathy from Small Dataset
| Misgina Tsighe HagosShri Kant
2019-05-17
SAdam: A Variant of Adam for Strongly Convex Functions
Guanghui WangShiyin LuWeiwei TuLijun Zhang
2019-05-08
Searching for MobileNetV3
| Andrew HowardMark SandlerGrace ChuLiang-Chieh ChenBo ChenMingxing TanWeijun WangYukun ZhuRuoming PangVijay VasudevanQuoc V. LeHartwig Adam
2019-05-06
A unified theory of adaptive stochastic gradient descent as Bayesian filtering
Laurence Aitchison
2019-05-01
Towards Combining On-Off-Policy Methods for Real-World Applications
Kai-Chun HuChen-Huan PiTing Han WeiI-Chen WuStone ChengYi-Wei DaiWei-Yuan Ye
2019-04-24
On the Convergence of Adam and Beyond
Sashank J. ReddiSatyen KaleSanjiv Kumar
2019-04-19
Double Transfer Learning for Breast Cancer Histopathologic Image Classification
Jonathan de MatosAlceu de S. Britto Jr.Luiz E. S. OliveiraAlessandro L. Koerich
2019-04-16
Understanding Unconventional Preprocessors in Deep Convolutional Neural Networks for Face Identification
Chollette C. OlisahLyndon Smith
2019-03-27
Adaptive Gradient Methods with Dynamic Bound of Learning Rate
| Liangchen LuoYuanhao XiongYan LiuXu Sun
2019-02-26
Escaping Saddle Points with Adaptive Gradient Methods
Matthew StaibSashank J. ReddiSatyen KaleSanjiv KumarSuvrit Sra
2019-01-26
Accelerating Convolutional Neural Networks via Activation Map Compression
Georgios Georgiadis
2018-12-10
Adaptive Methods for Nonconvex Optimization
| Manzil ZaheerSashank ReddiDevendra SachanSatyen KaleSanjiv Kumar
2018-12-01
Strike (with) a Pose: Neural Networks Are Easily Fooled by Strange Poses of Familiar Objects
| Michael A. AlcornQi LiZhitao GongChengfei WangLong MaiWei-Shinn KuAnh Nguyen
2018-11-28
A Sufficient Condition for Convergences of Adam and RMSProp
Fangyu ZouLi ShenZequn JieWeizhong ZhangWei Liu
2018-11-23
Kalman Gradient Descent: Adaptive Variance Reduction in Stochastic Optimization
| James Vuckovic
2018-10-29
Finding Mixed Nash Equilibria of Generative Adversarial Networks
Ya-Ping HsiehChen LiuVolkan Cevher
2018-10-23
DeepWeeds: A Multiclass Weed Species Image Dataset for Deep Learning
| Alex OlsenDmitry A. KonovalovBronson PhilippaPeter RiddJake C. WoodJamie JohnsWesley BanksBenjamin GirgentiOwen KennyJames WhinneyBrendan CalvertMostafa Rahimi AzghadiRonald D. White
2018-10-09
Preconditioner on Matrix Lie Group for SGD
| Xi-Lin Li
2018-09-26
Discovering Low-Precision Networks Close to Full-Precision Networks for Efficient Embedded Inference
Jeffrey L. McKinstrySteven K. EsserRathinakumar AppuswamyDeepika BablaniJohn V. ArthurIzzet B. YildizDharmendra S. Modha
2018-09-11
DFT-based Transformation Invariant Pooling Layer for Visual Classification
Jongbin RyuMing-Hsuan YangJongwoo Lim
2018-09-01
On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization
Dongruo ZhouYiqi TangZiyan YangYuan CaoQuanquan Gu
2018-08-16
Backtracking gradient descent method for general $C^1$ functions, with applications to Deep Learning
| Tuyen Trung TruongTuan Hang Nguyen
2018-08-15
Weighted AdaGrad with Unified Momentum
Fangyu ZouLi ShenZequn JieJu SunWei Liu
2018-08-10
MnasNet: Platform-Aware Neural Architecture Search for Mobile
| Mingxing TanBo ChenRuoming PangVijay VasudevanMark SandlerAndrew HowardQuoc V. Le
2018-07-31
Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods
Laurence Aitchison
2018-07-19
Convergence guarantees for RMSProp and ADAM in non-convex optimization and an empirical comparison to Nesterov acceleration
Soham DeAnirbit MukherjeeEnayat Ullah
2018-07-18
Maximizing Invariant Data Perturbation with Stochastic Optimization
| Kouichi IkenoSatoshi Hara
2018-07-12
Representation Learning with Contrastive Predictive Coding
| Aaron van den OordYazhe LiOriol Vinyals
2018-07-10
Dank Learning: Generating Memes Using Deep Neural Networks
| Abel L Peirson VE Meltem Tolunay
2018-06-08
Geometry Aware Constrained Optimization Techniques for Deep Learning
Soumava Kumar RoyZakaria MhammediMehrtash Harandi
2018-06-01
GenAttack: Practical Black-box Attacks with Gradient-Free Optimization
| Moustafa AlzantotYash SharmaSupriyo ChakrabortyHuan ZhangCho-Jui HsiehMani Srivastava
2018-05-28
Nostalgic Adam: Weighting more of the past gradients when designing the adaptive learning rate
| Haiwen HuangChang WangBin Dong
2018-05-19
ADef: an Iterative Algorithm to Construct Adversarial Deformations
| Rima AlaifariGiovanni S. AlbertiTandri Gauksson
2018-04-20
Value-aware Quantization for Training and Inference of Neural Networks
Eunhyeok ParkSungjoo YooPeter Vajda
2018-04-20
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
| Noam ShazeerMitchell Stern
2018-04-11
Human Semantic Parsing for Person Re-identification
Mahdi M. KalayehEmrah BasaranMuhittin GokmenMustafa E. KamasakMubarak Shah
2018-03-31
Distributed Prioritized Experience Replay
| Dan HorganJohn QuanDavid BuddenGabriel Barth-MaronMatteo HesselHado van HasseltDavid Silver
2018-03-02
Classification of breast cancer histology images using transfer learning
Sulaiman VesalNishant RavikumarAmirAbbas DavariStephan EllmannAndreas Maier
2018-02-26
Regularized Evolution for Image Classifier Architecture Search
| Esteban RealAlok AggarwalYanping HuangQuoc V Le
2018-02-05
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
| Lasse EspeholtHubert SoyerRemi MunosKaren SimonyanVolodymir MnihTom WardYotam DoronVlad FiroiuTim HarleyIain DunningShane LeggKoray Kavukcuoglu
2018-02-05
Handwritten Isolated Bangla Compound Character Recognition: a new benchmark using a novel deep learning approach
Saikat RoyNibaran DasMahantapas KunduMita Nasipuri
2018-02-02
Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach
| Tsui-Wei WengHuan ZhangPin-Yu ChenJinfeng YiDong SuYupeng GaoCho-Jui HsiehLuca Daniel
2018-01-31
MobileNetV2: Inverted Residuals and Linear Bottlenecks
| Mark SandlerAndrew HowardMenglong ZhuAndrey ZhmoginovLiang-Chieh Chen
2018-01-13
Large-Scale 3D Scene Classification With Multi-View Volumetric CNN
Dror AigerBrett AllenAleksey Golovinskiy
2017-12-26
Improving Generalization Performance by Switching from Adam to SGD
| Nitish Shirish KeskarRichard Socher
2017-12-20
Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems
Kexin PeiYinzhi CaoJunfeng YangSuman Jana
2017-12-05
Vprop: Variational Inference using RMSprop
Mohammad Emtiyaz KhanZuozhu LiuVoot TangkarattYarin Gal
2017-12-04
Progressive Neural Architecture Search
| Chenxi LiuBarret ZophMaxim NeumannJonathon ShlensWei HuaLi-Jia LiLi Fei-FeiAlan YuilleJonathan HuangKevin Murphy
2017-12-02
BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning
Ziming ZhangYuanwei WuGuanghui Wang
2017-11-19
Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes
Takuya AkibaShuji SuzukiKeisuke Fukuda
2017-11-12
Follow the Moving Leader in Deep Learning
Shuai ZhengJames T. Kwok
2017-08-01
Neural Optimizer Search using Reinforcement Learning
Irwan BelloBarret ZophVijay VasudevanQuoc V. Le
2017-08-01
Learning Transferable Architectures for Scalable Image Recognition
| Barret ZophVijay VasudevanJonathon ShlensQuoc V. Le
2017-07-21
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
| Chen SunAbhinav ShrivastavaSaurabh SinghAbhinav Gupta
2017-07-10
Variants of RMSProp and Adagrad with Logarithmic Regret Bounds
Mahesh Chandra MukkamalaMatthias Hein
2017-06-17
YellowFin and the Art of Momentum Tuning
| Jian ZhangIoannis Mitliagkas
2017-06-12
Diagonal Rescaling For Neural Networks
| Jean LafondNicolas VasilacheLéon Bottou
2017-05-25
The Marginal Value of Adaptive Gradient Methods in Machine Learning
| Ashia C. WilsonRebecca RoelofsMitchell SternNathan SrebroBenjamin Recht
2017-05-23
Scalable Planning with Tensorflow for Hybrid Nonlinear Domains
Ga WuBuser SayScott Sanner
2017-04-25
Improved Training of Wasserstein GANs
| Ishaan GulrajaniFaruk AhmedMartin ArjovskyVincent DumoulinAaron Courville
2017-03-31
Deep Learning for Skin Lesion Classification
P. MirunaliniAravindan ChandraboseVignesh GokulS. M. Jaisakthi
2017-03-13
Skin Lesion Classification Using Deep Multi-scale Convolutional Neural Networks
Terrance DeVriesDhanesh Ramachandram
2017-03-04
Identifying Best Interventions through Online Importance Sampling
Rajat SenKarthikeyan ShanmugamAlexandros G. DimakisSanjay Shakkottai
2017-01-10
Temporal Generative Adversarial Nets with Singular Value Clipping
| Masaki SaitoEiichi MatsumotoShunta Saito
2016-11-21
Least Squares Generative Adversarial Networks
| Xudong MaoQing LiHaoran XieRaymond Y. K. LauZhen WangStephen Paul Smolley
2016-11-13
Quasi-Recurrent Neural Networks
| James BradburyStephen MerityCaiming XiongRichard Socher
2016-11-05
Xception: Deep Learning with Depthwise Separable Convolutions
| François Chollet
2016-10-07
Multiplicative LSTM for sequence modelling
Ben KrauseLiang LuIain MurraySteve Renals
2016-09-26
Adaptive Learning Rate via Covariance Matrix Based Preconditioning for Deep Neural Networks
Yasutoshi IdaYasuhiro FujiwaraSotetsu Iwamura
2016-05-31
Stacked Hourglass Networks for Human Pose Estimation
| Alejandro NewellKaiyu YangJia Deng
2016-03-22
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
| Christian SzegedySergey IoffeVincent VanhouckeAlex Alemi
2016-02-23
RandomOut: Using a convolutional gradient norm to rescue convolutional filters
Joseph Paul CohenHenry Z. LoWei Ding
2016-02-18
Rethinking the Inception Architecture for Computer Vision
| Christian SzegedyVincent VanhouckeSergey IoffeJonathon ShlensZbigniew Wojna
2015-12-02
Unitary Evolution Recurrent Neural Networks
| Martin ArjovskyAmar ShahYoshua Bengio
2015-11-20
Teaching Machines to Read and Comprehend
| Karl Moritz HermannTomáš KočiskýEdward GrefenstetteLasse EspeholtWill KayMustafa SuleymanPhil Blunsom
2015-06-10

Tasks

TASK PAPERS SHARE
Image Classification 28 27.45%
Object Detection 6 5.88%
Machine Translation 5 4.90%
Semantic Segmentation 5 4.90%
Language Modelling 5 4.90%
Quantization 4 3.92%
Atari Games 3 2.94%
Adversarial Attack 3 2.94%
Object Recognition 3 2.94%

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories