On Adversarial Robustness of Small vs Large Batch Training

17 May 2019  ·  Sandesh Kamath, Amit Despande, K V Subrahmanyam ·

Large-batch training is known to incur poor generalization by Jastrzebski et al. (2017) as well as poor adversarial robustness by Yao et al. (2018b). Hessian-based analysis of large-batch training by Yao et al. (2018b) concludes that adversarial training as well as small-batch training leads to lower Hessian spectrum. They combine adversarial training and second order information to come up with a new large-batch training algorithm to obtain robust models with good generalization. In this paper, we empirically observe that networks trained with constant learning rate to batch size ratio as proposed by Jastrzebski et al. (2017) not only have better generalization but also have roughly constant adversarial robustness across all batch sizes.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here