Cost-effective Selection of Pretraining Data: A Case Study of Pretraining BERT on Social Media

2 Oct 2020 Xiang Dai Sarvnaz Karimi Ben Hachey Cecile Paris

Recent studies on domain-specific BERT models show that effectiveness on downstream tasks can be improved when models are pretrained on in-domain data. Often, the pretraining data used in these models are selected based on their subject matter, e.g., biology or computer science... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper