no code implementations • 2 Jul 2021 • Shih-Yu Sun, Vimal Thilak, Etai Littwin, Omid Saremi, Joshua M. Susskind
Deep linear networks trained with gradient descent yield low rank solutions, as is typically studied in matrix factorization.
no code implementations • 15 May 2019 • Chen Huang, Shuangfei Zhai, Walter Talbott, Miguel Angel Bautista, Shih-Yu Sun, Carlos Guestrin, Josh Susskind
In most machine learning training paradigms a fixed, often handcrafted, loss function is assumed to be a good proxy for an underlying evaluation metric.