no code implementations • 6 Feb 2024 • Shahriar Golchin, Nikhil Garuda, Christopher Impey, Matthew Wenger
Specifically, we focus on two state-of-the-art LLMs: GPT-4 and GPT-3. 5, across three distinct courses: Introductory Astronomy, Astrobiology, and the History and Philosophy of Astronomy.
1 code implementation • 10 Nov 2023 • Shahriar Golchin, Mihai Surdeanu
We propose the Data Contamination Quiz (DCQ), a simple and effective approach to detect data contamination in large language models (LLMs) and estimate the amount of it.
1 code implementation • 16 Aug 2023 • Shahriar Golchin, Mihai Surdeanu
To estimate contamination of individual instances, we employ "guided instruction:" a prompt consisting of the dataset name, partition type, and the random-length initial segment of a reference instance, asking the LLM to complete it.
no code implementations • 14 Jul 2023 • Shahriar Golchin, Mihai Surdeanu, Nazgol Tavabi, Ata Kiapour
We propose a novel task-agnostic in-domain pre-training method that sits between generic pre-training and fine-tuning.
no code implementations • 25 Aug 2022 • Shahriar Golchin, Mihai Surdeanu, Nazgol Tavabi, Ata Kiapour
We construct these compact subsets from the unstructured data using a combination of abstractive summaries and extractive keywords.