In this paper, we will provide an introduction to the derivative-free
optimization algorithms which can be potentially applied to train deep learning
models. Existing deep learning model training is mostly based on the back
propagation algorithm, which updates the model variables layers by layers with
the gradient descent algorithm or its variants...
However, the objective
functions of deep learning models to be optimized are usually non-convex and
the gradient descent algorithms based on the first-order derivative can get
stuck into the local optima very easily. To resolve such a problem, various
local or global optimization algorithms have been proposed, which can help
improve the training of deep learning models greatly. The representative
examples include the Bayesian methods, Shubert-Piyavskii algorithm, Direct,
LIPO, MCS, GA, SCE, DE, PSO, ES, CMA-ES, hill climbing and simulated annealing,
etc. One part of these algorithms will be introduced in this paper (including
the Bayesian method and Lipschitzian approaches, e.g., Shubert-Piyavskii
algorithm, Direct, LIPO and MCS), and the remaining algorithms (including the
population based optimization algorithms, e.g., GA, SCE, DE, PSO, ES and
CMA-ES, and random search algorithms, e.g., hill climbing and simulated
annealing) will be introduced in the follow-up paper [18] in detail.
(read more)
PDF
Abstract