2 code implementations • 8 Feb 2024 • Mohammed Muqeeth, Haokun Liu, Yufan Liu, Colin Raffel
Unlike past methods that learn to route among specialized models, PHATGOOSE explores the possibility that zero-shot generalization will be improved if different experts can be adaptively chosen for each token and at each layer in the model.
1 code implementation • 7 Jun 2023 • Nikhil Kandpal, Brian Lester, Mohammed Muqeeth, Anisha Mascarenhas, Monty Evans, Vishal Baskaran, Tenghao Huang, Haokun Liu, Colin Raffel
Currently, most machine learning models are trained by centralized teams and are rarely updated.
no code implementations • 6 Jun 2023 • Mohammed Muqeeth, Haokun Liu, Colin Raffel
To address this issue, we introduce Soft Merging of Experts with Adaptive Routing (SMEAR), which avoids discrete routing by using a single "merged" expert constructed via a weighted average of all of the experts' parameters.
2 code implementations • 11 May 2022 • Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, Colin Raffel
ICL incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
Ranked #1 on Few-Shot Text Classification on RAFT