Search Results for author: Tanishq Kumar

Found 3 papers, 0 papers with code

No Free Prune: Information-Theoretic Barriers to Pruning at Initialization

no code implementations2 Feb 2024 Tanishq Kumar, Kevin Luo, Mark Sellke

We put forward a theoretical explanation for this, based on the model's effective parameter count, $p_\text{eff}$, given by the sum of the number of non-zero weights in the final network and the mutual information between the sparsity mask and the data.

Grokking as the Transition from Lazy to Rich Training Dynamics

no code implementations9 Oct 2023 Tanishq Kumar, Blake Bordelon, Samuel J. Gershman, Cengiz Pehlevan

We identify sufficient statistics for the test loss of such a network, and tracking these over training reveals that grokking arises in this setting when the network first attempts to fit a kernel regression solution with its initial features, followed by late-time feature learning where a generalizing solution is identified after train loss is already low.

Cannot find the paper you are looking for? You can Submit a new open access paper.