1 code implementation • 25 Jun 2022 • Tongtian Zhu, Fengxiang He, Lan Zhang, Zhengyang Niu, Mingli Song, DaCheng Tao
Our theory indicates that the generalizability of D-SGD is positively correlated with the spectral gap, and can explain why consensus control in initial training phase can ensure better generalization.