no code implementations • 5 Mar 2024 • Aditya Cowsik, Tamra Nebabu, Xiao-Liang Qi, Surya Ganguli
Our update equations show that without MLP layers, this system will collapse to a line, consistent with prior work on rank collapse in transformers.