no code implementations • 25 Sep 2023 • Khoi Do, Duong Nguyen, Hoa Nguyen, Long Tran-Thanh, Nguyen-Hoang Tran, Quoc-Viet Pham
This paper explores Large Batch Training techniques using layer-wise adaptive scaling ratio (LARS) across diverse settings, uncovering insights.