no code implementations • 1 Mar 2024 • Vithursan Thangarasa, Mahmoud Salem, Shreyas Saxena, Kevin Leong, Joel Hestness, Sean Lie
Large language models (LLMs) are typically trained on general source data for various domains, but a recent surge in domain-specific LLMs has shown their potential to outperform general-purpose models in domain-specific tasks (e. g., biomedicine).
Ranked #10 on Question Answering on PubMedQA
2 code implementations • 21 Mar 2023 • Vithursan Thangarasa, Shreyas Saxena, Abhay Gupta, Sean Lie
Recent research has focused on weight sparsity in neural network training to reduce FLOPs, aiming for improved efficiency (test accuracy w. r. t training FLOPs).
no code implementations • 18 Mar 2023 • Vithursan Thangarasa, Abhay Gupta, William Marshall, Tianda Li, Kevin Leong, Dennis Decoste, Sean Lie, Shreyas Saxena
In this work, we show the benefits of using unstructured weight sparsity to train only a subset of weights during pre-training (Sparse Pre-training) and then recover the representational capacity by allowing the zeroed weights to learn (Dense Fine-tuning).