no code implementations • 29 May 2024 • Simin Fan, Razvan Pascanu, Martin Jaggi
Grokking refers to a sharp rise of the network's generalization accuracy on the test set, which occurs long after an extended overfitting phase, during which the network perfectly fits the training set.
1 code implementation • 27 Nov 2023 • Zeming Chen, Alejandro Hernández Cano, Angelika Romanou, Antoine Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas Köpf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza Sakhaeirad, Vinitra Swamy, Igor Krawczuk, Deniz Bayazit, Axel Marmet, Syrielle Montariol, Mary-Anne Hartley, Martin Jaggi, Antoine Bosselut
Large language models (LLMs) can potentially democratize access to medical knowledge.
Ranked #1 on Multiple Choice Question Answering (MCQA) on MedMCQA (Dev Set (Acc-%) metric)
no code implementations • 23 Oct 2023 • Simin Fan, Matteo Pagliardini, Martin Jaggi
Moreover, aiming to generalize to out-of-domain target tasks, which is unseen in the pretraining corpus (OOD domain), DoGE can effectively identify inter-domain dependencies, and consistently achieves better test perplexity on the target domain.
no code implementations • 23 Oct 2023 • Simin Fan, Martin Jaggi
Automatic data selection and curriculum design for training large language models is challenging, with only a few existing methods showing improvements over standard training.
1 code implementation • NAACL 2022 • Xu Wang, Simin Fan, Jessica Houghton, Lu Wang
NLP-powered automatic question generation (QG) techniques carry great pedagogical potential of saving educators' time and benefiting student learning.