no code implementations • 9 Aug 2022 • Qingguo Hong, Jonathan W. Siegel, Qinyang Tan, Jinchao Xu
Our empirical studies also show that neural networks with the Hat activation function are trained significantly faster using stochastic gradient descent and ADAM.