ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

1 Jun 2020Zhewei YaoAmir GholamiSheng ShenKurt KeutzerMichael W. Mahoney

We introduce AdaHessian, a second order stochastic optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates of the Hessian. Second order algorithms are among the most powerful optimization algorithms with superior convergence properties as compared to first order methods such as SGD and ADAM... (read more)

PDF Abstract

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper