Continual Pretraining

20 papers with code • 3 benchmarks • 3 datasets

This task has no description! Would you like to contribute one?

Libraries

Use these libraries to find Continual Pretraining models and implementations

Most implemented papers

Continual Training of Language Models for Few-Shot Learning

uic-liu-lab/cpt 11 Oct 2022

Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications.

ECONET: Effective Continual Pretraining of Language Models for Event Temporal Reasoning

pluslabnlp/econet EMNLP 2021

While pre-trained language models (PTLMs) have achieved noticeable success on many NLP tasks, they still struggle for tasks that require event temporal reasoning, which is essential for event-centric applications.

Continual Pre-training of Language Models

UIC-Liu-Lab/ContinualLM 7 Feb 2023

A novel proxy is also proposed to preserve the general knowledge in the original LM.

Towards Geospatial Foundation Models via Continual Pretraining

mmendiet/gfm ICCV 2023

Geospatial technologies are becoming increasingly essential in our world for a wide range of applications, including agriculture, urban planning, and disaster response.

Autonomous Data Selection with Language Models for Mathematical Texts

hiyouga/llama-factory 12 Feb 2024

Our method showcases a 2 times increase in pretraining token efficiency compared to state-of-the-art baselines, underscoring the potential of our approach in enhancing models' mathematical reasoning capabilities.

Data Engineering for Scaling Language Models to 128K Context

franxyao/long-context-data-engineering 15 Feb 2024

We demonstrate that continual pretraining of the full model on 1B-5B tokens of such data is an effective and affordable strategy for scaling the context length of language models to 128K.

Rho-1: Not All Tokens Are What You Need

microsoft/rho 11 Apr 2024

After fine-tuning, Rho-1-1B and 7B achieved state-of-the-art results of 40. 6% and 51. 8% on MATH dataset, respectively - matching DeepSeekMath with only 3% of the pretraining tokens.

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

bionlu-coling2024/biomed-ner-intent_detection 31 Jul 2020

In this paper, we challenge this assumption by showing that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models.

Efficient Contrastive Learning via Novel Data Augmentation and Curriculum Learning

vano1205/efficientcl EMNLP 2021

We introduce EfficientCL, a memory-efficient continual pretraining method that applies contrastive learning with novel data augmentation and curriculum learning.

On the Robustness of Reading Comprehension Models to Entity Renaming

ink-usc/entity-robustness NAACL 2022

We study the robustness of machine reading comprehension (MRC) models to entity renaming -- do models make more wrong predictions when the same questions are asked about an entity whose name has been changed?