no code implementations • 11 Dec 2023 • Dashiell Stander, Qinan Yu, Honglu Fan, Stella Biderman
We use the group Fourier transform over the symmetric group $S_n$ to reverse engineer a 1-layer feedforward network that has "grokked" the multiplication of $S_5$ and $S_6$.
5 code implementations • 31 Aug 2023 • Bowen Peng, Jeffrey Quesnelle, Honglu Fan, Enrico Shippole
Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models.
no code implementations • 1 Jul 2023 • Gergely Bérczi, Honglu Fan, Mingcong Zeng
The solution set of a system of polynomial equations typically contains ill-behaved, singular points.
no code implementations • 30 Jun 2023 • Guillaume Sanchez, Honglu Fan, Alexander Spangher, Elad Levi, Pawan Sasanka Ammanamanchi, Stella Biderman
Classifier-Free Guidance (CFG) has recently emerged in text-to-image generation as a lightweight technique to encourage prompt-adherence in generations.
Ranked #1 on Text Generation on SciQ