Continual Pretraining

20 papers with code • 3 benchmarks • 3 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Continual Pretraining

Dataset	Best Model	Compare
AG News	CPT	See all
ACL-ARC	DAS	See all
SciERC	DAS	See all

Libraries

Use these libraries to find Continual Pretraining models and implementations

zixuanke/pycontinual

2 papers

274

UIC-Liu-Lab/ContinualLM

2 papers

207

Datasets

Most implemented papers

Most implemented Social Latest No code

Continual Training of Language Models for Few-Shot Learning

uic-liu-lab/cpt • • 11 Oct 2022

Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications.

Paper
Code

ECONET: Effective Continual Pretraining of Language Models for Event Temporal Reasoning

pluslabnlp/econet • • EMNLP 2021

While pre-trained language models (PTLMs) have achieved noticeable success on many NLP tasks, they still struggle for tasks that require event temporal reasoning, which is essential for event-centric applications.

Paper
Code

Continual Pre-training of Language Models

UIC-Liu-Lab/ContinualLM • • 7 Feb 2023

A novel proxy is also proposed to preserve the general knowledge in the original LM.

Paper
Code

Towards Geospatial Foundation Models via Continual Pretraining

mmendiet/gfm • • ICCV 2023

Geospatial technologies are becoming increasingly essential in our world for a wide range of applications, including agriculture, urban planning, and disaster response.

Paper
Code

Autonomous Data Selection with Language Models for Mathematical Texts

hiyouga/llama-factory • • 12 Feb 2024

Our method showcases a 2 times increase in pretraining token efficiency compared to state-of-the-art baselines, underscoring the potential of our approach in enhancing models' mathematical reasoning capabilities.

Paper
Code

Data Engineering for Scaling Language Models to 128K Context

franxyao/long-context-data-engineering • • 15 Feb 2024

We demonstrate that continual pretraining of the full model on 1B-5B tokens of such data is an effective and affordable strategy for scaling the context length of language models to 128K.

Paper
Code

Rho-1: Not All Tokens Are What You Need

microsoft/rho • 11 Apr 2024

After fine-tuning, Rho-1-1B and 7B achieved state-of-the-art results of 40. 6% and 51. 8% on MATH dataset, respectively - matching DeepSeekMath with only 3% of the pretraining tokens.

Paper
Code

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

bionlu-coling2024/biomed-ner-intent_detection • • 31 Jul 2020

In this paper, we challenge this assumption by showing that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models.

Paper
Code

Efficient Contrastive Learning via Novel Data Augmentation and Curriculum Learning

vano1205/efficientcl • • EMNLP 2021

We introduce EfficientCL, a memory-efficient continual pretraining method that applies contrastive learning with novel data augmentation and curriculum learning.

Paper
Code

On the Robustness of Reading Comprehension Models to Entity Renaming

ink-usc/entity-robustness • • NAACL 2022

We study the robustness of machine reading comprehension (MRC) models to entity renaming -- do models make more wrong predictions when the same questions are asked about an entity whose name has been changed?

Paper
Code

Continual Pretraining

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result