RoBERTa is an extension of BERT with changes to the pretraining procedure. The modifications include:

  • training the model longer, with bigger batches, over more data
  • removing the next sentence prediction objective
  • training on longer sequences
  • dynamically changing the masking pattern applied to the training data. The authors also collect a large new dataset ($\text{CC-News}$) of comparable size to other privately used datasets, to better control for training set size effects
Source: RoBERTa: A Robustly Optimized BERT Pretraining Approach

Latest Papers

PAPER DATE
UniCase -- Rethinking Casing in Language Models
Rafal PowalskiTomasz Stanislawek
2020-10-22
Detection of COVID-19 informative tweets using RoBERTa
Sirigireddy DhanalaxmiRohit AgarwalAman Sinha
2020-10-21
AutoMeTS: The Autocomplete for Medical Text Simplification
Hoang VanDavid KauchakGondy Leroy
2020-10-20
ConjNLI: Natural Language Inference Over Conjunctive Sentences
Swarnadeep SahaYixin NieMohit Bansal
2020-10-20
Explaining and Improving Model Behavior with k Nearest Neighbor Representations
Nazneen Fatema RajaniBen KrauseWengpeng YinTong NiuRichard SocherCaiming Xiong
2020-10-18
Linguistically-Informed Transformations (LIT): A Method forAutomatically Generating Contrast Sets
| Chuanrong LiLin ShengshuoLeo Z. LiuXinyi WuXuhui ZhouShane Steinert-Threlkeld
2020-10-16
Neural Deepfake Detection with Factual Structure of Text
Wanjun ZhongDuyu TangZenan XuRuize WangNan DuanMing ZhouJiahai WangJian Yin
2020-10-15
Aspect-based Document Similarity for Research Papers
| Malte OstendorffTerry RuasTill BlumeBela GippGeorg Rehm
2020-10-13
Chatbot Interaction with Artificial Intelligence: Human Data Augmentation with T5 and Language Transformer Ensemble for Text Classification
Jordan J. BirdAnikó EkártDiego R. Faria
2020-10-12
Probing Pretrained Language Models for Lexical Semantics
Ivan VulićEdoardo Maria PontiRobert LitschkoGoran GlavašAnna Korhonen
2020-10-12
EFSG: Evolutionary Fooling Sentences Generator
Marco Di GiovanniMarco Brambilla
2020-10-12
From Hero to Zéroe: A Benchmark of Low-Level Adversarial Attacks
| Steffen EgerYannik Benz
2020-10-12
Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)
| Alex WarstadtYian ZhangHaau-Sing LiHaokun LiuSamuel R. Bowman
2020-10-11
Data Agnostic RoBERTa-based Natural Language to SQL Query Generation
| Debaditya PalHarsh SharmaKaustubh Chaudhari
2020-10-11
Compressing Transformer-Based Semantic Parsing Models using Compositional Code Embeddings
Prafull PrakashSaurabh Kumar ShashidharWenlong ZhaoSubendhu RongaliHaidar KhanMichael Kayser
2020-10-10
NutCracker at WNUT-2020 Task 2: Robustly Identifying Informative COVID-19 Tweets using Ensembling and Adversarial Training
| Priyanshu KumarAadarsh Singh
2020-10-09
Deep Learning Meets Projective Clustering
Alaa MaaloufHarry LangDaniela RusDan Feldman
2020-10-08
On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers
| Marius MosbachAnna KhokhlovaMichael A. HedderichDietrich Klakow
2020-10-06
How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers?
Shayne LongpreYu WangChristopher DuBois
2020-10-05
Self-training Improves Pre-training for Natural Language Understanding
Jingfei DuEdouard GraveBeliz GunelVishrav ChaudharyOnur CelebiMichael AuliVes StoyanovAlexis Conneau
2020-10-05
Mining Knowledge for Natural Language Inference from Wikipedia Categories
Mingda ChenZewei ChuKarl StratosKevin Gimpel
2020-10-03
BET: A Backtranslation Approach for Easy Data Augmentation in Transformer-based Paraphrase Identification Context
Jean-Philippe CorbeilHadi Abdi Ghadivel
2020-09-25
On Data Augmentation for Extreme Multi-label Classification
Danqing ZhangTao LiHaiyang ZhangBing Yin
2020-09-22
Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application
Chris J. KennedyGeoff BaconAlexander SahnClaudia von Vacano
2020-09-22
Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning
Bingbing LiZhenglun KongTianyun ZhangJi LiZhengang LiHang LiuCaiwen Ding
2020-09-17
Compositional and Lexical Semantics in RoBERTa, BERT and DistilBERT: A Case Study on CoQA
Ieva StaliūnaitėIgnacio Iacobacci
2020-09-17
Solomon at SemEval-2020 Task 11: Ensemble Architecture for Fine-Tuned Propaganda Detection in News Articles
Mayank RajAjay JaiswalRohit R. RAnkita GuptaSudeep Kumar SahooVertika SrivastavaYeon Hyang Kim
2020-09-16
BoostingBERT:Integrating Multi-Class Boosting into BERT for NLP Tasks
Tongwen HuangQingyun SheJunlin Zhang
2020-09-13
CIA_NITT at WNUT-2020 Task 2: Classification of COVID-19 Tweets Using Pre-trained Language Models
Yandrapati Prakash BabuRajagopal Eswari
2020-09-12
Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation
Murad TukanAlaa MaaloufMatan WekslerDan Feldman
2020-09-11
UPB at SemEval-2020 Task 6: Pretrained Language Models for DefinitionExtraction
Andrei-Marius AvramDumitru-Clementin CercelCostin-Gabriel Chiru
2020-09-11
Do Response Selection Models Really Know What's Next? Utterance Manipulation Strategies for Multi-turn Response Selection
| Taesun WhangDongyub LeeDongsuk OhChanhee LeeKijong HanDong-hun LeeSaebyeok Lee
2020-09-10
Comparative Study of Language Models on Cross-Domain Data with Model Agnostic Explainability
Mayank ChhipaHrushikesh Mahesh VazurkarAbhijeet KumarMridul Mishra
2020-09-09
ERNIE at SemEval-2020 Task 10: Learning Word Emphasis Selection by Pre-trained Language Model
Zhengjie HuangShikun FengWeiyue SuXuyi ChenShuohuan WangJiaxiang LiuXuan OuyangYu Sun
2020-09-08
EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets
Nickil Maveli
2020-09-06
QiaoNing at SemEval-2020 Task 4: Commonsense Validation and Explanation system based on ensemble of language model
Pai Liu
2020-09-06
Accenture at CheckThat! 2020: If you say so: Post-hoc fact-checking of claims using transformer-based models
Evan WilliamsPaul RodriguesValerie Novak
2020-09-05
Conceptualized Representation Learning for Chinese Biomedical Text Mining
Ningyu ZhangQianghuai JiaKangping YinLiang DongFeng GaoNengwei Hua
2020-08-25
HinglishNLP: Fine-tuned Language Models for Hinglish Sentiment Detection
Meghana BhangeNirant Kasliwal
2020-08-22
KR-BERT: A Small-Scale Korean-Specific Language Model
Sangah LeeHansol JangYunmee BaikHyopil Shin
2020-08-10
SemEval-2020 Task 10: Emphasis Selection for Written Text in Visual Media
Amirreza ShiraniFranck DernoncourtNedim LipkaPaul AsenteJose EchevarriaThamar Solorio
2020-08-07
aschern at SemEval-2020 Task 11: It Takes Three to Tango: RoBERTa, CRF, and Transfer Learning
| Anton ChernyavskiyDmitry IlvovskyPreslav Nakov
2020-08-06
Composer Style Classification of Piano Sheet Music Images Using Language Model Pretraining
TJ TsaiKevin Ji
2020-07-29
BUT-FIT at SemEval-2020 Task 5: Automatic detection of counterfactual statements with deep pre-trained language representation models
Martin FajcikJosef JonMartin DocekalPavel Smrz
2020-07-28
Variants of BERT, Random Forests and SVM approach for Multimodal Emotion-Target Sub-challenge
Hoang Manh HungHyung-Jeong YangSoo-Hyung KimGuee-Sang Lee
2020-07-28
problemConquero at SemEval-2020 Task 12: Transformer and Soft label-based approaches
Karishma LaudJagriti SinghRandeep Kumar SahuAshutosh Modi
2020-07-21
newsSweeper at SemEval-2020 Task 11: Context-Aware Rich Feature Representations For Propaganda Classification
| Paramansh SinghSiraj SandhuSubham KumarAshutosh Modi
2020-07-21
AdapterHub: A Framework for Adapting Transformers
| Jonas PfeifferAndreas RückléClifton PothAishwarya KamathIvan VulićSebastian RuderKyunghyun ChoIryna Gurevych
2020-07-15
Contrastive Code Representation Learning
| Paras JainAjay JainTianjun ZhangPieter AbbeelJoseph E. GonzalezIon Stoica
2020-07-09
Deep Contextual Embeddings for Address Classification in E-commerce
Shreyas MangalgiLakshya KumarRavindra Babu Tallamraju
2020-07-06
Robust Prediction of Punctuation and Truecasing for Medical ASR
Monica SunkaraSrikanth RonankiKalpit DixitSravan BodapatiKatrin Kirchhoff
2020-07-04
Transformers on Sarcasm Detection with Context
2020-07-01
How does BERT's attention change when you fine-tune? An analysis methodology and a case study in negation scope
Yiyun ZhaoSteven Bethard
2020-07-01
Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?
Yada PruksachatkunJason PhangHaokun LiuPhu Mon HtutXiaoyi ZhangRichard Yuanzhe PangClara VaniaKatharina KannSamuel R. Bowman
2020-07-01
SUPP.AI: finding evidence for supplement-drug interactions
Lucy WangOyvind TafjordArman CohanSarthak JainSam SkjonsbergCarissa SchoenickNick BotnerWaleed Ammar
2020-07-01
Neural Sarcasm Detection using Conversation Context
Nikhil Jaiswal
2020-07-01
IlliniMet: Illinois System for Metaphor Detection with Contextual and Linguistic Information
Hongyu GongKshitij GuptaAkriti JainSuma Bhat
2020-07-01
A Transformer Approach to Contextual Sarcasm Detection in Twitter
Hunter GregorySteven LiPouya MohammadiNatalie TarnRachel DraelosCynthia Rudin
2020-07-01
RobertNLP at the IWPT 2020 Shared Task: Surprisingly Simple Enhanced UD Parsing for English
Stefan Gr{\"u}newaldAnnemarie Friedrich
2020-07-01
Robust Prediction of Punctuation and Truecasing for Medical ASR
Monica SunkaraSrikanth RonankiKalpit DixitSravan BodapatiKatrin Kirchhoff
2020-07-01
Modelling Context and Syntactical Features for Aspect-based Sentiment Analysis
Minh Hieu PhanPhilip O. Ogunbona
2020-07-01
Want to Identify, Extract and Normalize Adverse Drug Reactions in Tweets? Use RoBERTa
Katikapalli Subramanyam KalyanS. Sangeetha
2020-06-29
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
| Marius MosbachMaksym AndriushchenkoDietrich Klakow
2020-06-08
Medical Concept Normalization in User Generated Texts by Learning Target Concept Embeddings
Katikapalli Subramanyam KalyanS. Sangeetha
2020-06-07
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
| Pengcheng HeXiaodong LiuJianfeng GaoWeizhu Chen
2020-06-05
Emergence of Separable Manifolds in Deep Language Representations
Jonathan MamouHang LeMiguel Del RioCory StephensonHanlin TangYoon KimSueYeon Chung
2020-06-01
BERT-based Ensembles for Modeling Disclosure and Support in Conversational Social Media Text
Tanvi DaduKartikey PantRadhika Mamidi
2020-06-01
Language Representation Models for Fine-Grained Sentiment Classification
Brian CheangBailey WeiDavid KoganHowey QiuMasud Ahmed
2020-05-27
L2R2: Leveraging Ranking for Abductive Reasoning
| Yunchang ZhuLiang PangYanyan LanXueqi Cheng
2020-05-22
Robust Layout-aware IE for Visually Rich Documents with Pre-trained Language Models
Mengxi WeiYifan HeQiong Zhang
2020-05-22
BERTweet: A pre-trained language model for English Tweets
| Dat Quoc NguyenThanh VuAnh Tuan Nguyen
2020-05-20
Adversarial Training for Commonsense Inference
Lis PereiraXiaodong LiuFei ChengMasayuki AsaharaIchiro Kobayashi
2020-05-17
On the Robustness of Language Encoders against Grammatical Errors
Fan YinQuanyu LongTao MengKai-Wei Chang
2020-05-12
How Context Affects Language Models' Factual Predictions
Fabio PetroniPatrick LewisAleksandra PiktusTim RocktäschelYuxiang WuAlexander H. MillerSebastian Riedel
2020-05-10
Beyond Accuracy: Behavioral Testing of NLP models with CheckList
| Marco Tulio RibeiroTongshuang WuCarlos GuestrinSameer Singh
2020-05-08
Unsupervised Alignment-based Iterative Evidence Retrieval for Multi-hop Question Answering
| Vikas YadavSteven BethardMihai Surdeanu
2020-05-04
Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models
Bill Yuchen LinSeyeon LeeRahul KhannaXiang Ren
2020-05-02
IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization
| Wenxuan ZhouBill Yuchen LinXiang Ren
2020-05-02
HipoRank: Incorporating Hierarchical and Positional Information into Graph-based Unsupervised Long Document Extractive Summarization
Yue DongAndrei RomascanuJackie C. K. Cheung
2020-05-01
Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work?
Yada PruksachatkunJason PhangHaokun LiuPhu Mon HtutXiaoyi ZhangRichard Yuanzhe PangClara VaniaKatharina KannSamuel R. Bowman
2020-05-01
Aggression Identification in English, Hindi and Bangla Text using BERT, RoBERTa and SVM
| Arup BaruahKaushik DasFerdous BarbhuiyaKuntal Dey
2020-05-01
Revisiting Pre-Trained Models for Chinese Natural Language Processing
| Yiming CuiWanxiang CheTing LiuBing QinShijin WangGuoping Hu
2020-04-29
Masking as an Efficient Alternative to Finetuning for Pretrained Language Models
Mengjie ZhaoTao LinMartin JaggiHinrich Schütze
2020-04-26
Classification of Cuisines from Sequentially Structured Recipes
Tript SharmaUtkarsh UpadhyayGanesh Bagler
2020-04-26
Contextualized Representations Using Textual Encyclopedic Knowledge
Mandar JoshiKenton LeeYi LuanKristina Toutanova
2020-04-24
Collecting Entailment Data for Pretraining: New Protocols and Negative Results
| Samuel R. BowmanJennimaria PalomakiLivio Baldini SoaresEmily Pitler
2020-04-24
UHH-LT at SemEval-2020 Task 12: Fine-Tuning of Pre-Trained Transformer Networks for Offensive Language Detection
Gregor WiedemannSeid Muhie YimamChris Biemann
2020-04-23
Residual Energy-Based Models for Text Generation
Yuntian DengAnton BakhtinMyle OttArthur SzlamMarc'Aurelio Ranzato
2020-04-22
StereoSet: Measuring stereotypical bias in pretrained language models
| Moin NadeemAnna BethkeSiva Reddy
2020-04-20
Adversarial Training for Large Neural Language Models
| Xiaodong LiuHao ChengPengcheng HeWeizhu ChenYu WangHoifung PoonJianfeng Gao
2020-04-20
Learning-to-Rank with BERT in TF-Ranking
Shuguang HanXuanhui WangMike BenderskyMarc Najork
2020-04-17
Training with Quantization Noise for Extreme Model Compression
| Angela FanPierre StockBenjamin GrahamEdouard GraveRemi GribonvalHerve JegouArmand Joulin
2020-04-15
A Simple Yet Strong Pipeline for HotpotQA
Dirk GroeneveldTushar KhotMausamAshish Sabharwal
2020-04-14
Robustly Pre-trained Neural Model for Direct Temporal Relation Extraction
Hong GuanJianfu LiHua XuMurthy Devarakonda
2020-04-13
Poor Man's BERT: Smaller and Faster Transformer Models
| Hassan SajjadFahim DalviNadir DurraniPreslav Nakov
2020-04-08
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu HouLifeng ShangXin JiangQun Liu
2020-04-08
Severing the Edge Between Before and After: Neural Architectures for Temporal Ordering of Events
Miguel BallesterosRishita AnubhaiShuai WangNima PourdamghaniYogarshi VyasJie MaParminder BhatiaKathleen McKeownYaser Al-Onaizan
2020-04-08
TextGAIL: Generative Adversarial Imitation Learning for Text Generation
Qingyang WuLei LiZhou Yu
2020-04-07
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering
| Changmao LiJinho D. Choi
2020-04-07
Improved Pretraining for Domain-specific Contextual Embedding Models
Subendhu RongaliAbhyuday JagannathaBhanu Pratap Singh RawatHong Yu
2020-04-05
Gestalt: a Stacking Ensemble for SQuAD2.0
Mohamed El-Geish
2020-04-02
Deep Entity Matching with Pre-Trained Language Models
Yuliang LiJinfeng LiYoshihiko SuharaAnHai DoanWang-Chiew Tan
2020-04-01
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
| Kevin ClarkMinh-Thang LuongQuoc V. LeChristopher D. Manning
2020-03-23
Calibration of Pre-trained Transformers
| Shrey DesaiGreg Durrett
2020-03-17
HypoNLI: Exploring the Artificial Patterns of Hypothesis-only Bias in Natural Language Inference
Tianyu LiuXin ZhengBaobao ChangZhifang Sui
2020-03-05
jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models
| Yada PruksachatkunPhil YeresHaokun LiuJason PhangPhu Mon HtutAlex WangIan TenneySamuel R. Bowman
2020-03-04
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
| Xiaodong LiuYu WangJianshu JiHao ChengXueyun ZhuEmmanuel AwaPengcheng HeWeizhu ChenHoifung PoonGuihong CaoJianfeng Gao
2020-02-19
Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks
Carlos AspillagaAndrés CarvalloVladimir Araujo
2020-02-14
Application of Pre-training Models in Named Entity Recognition
Yu WangYining SunZuchang MaLisheng GaoYang XuTing Sun
2020-02-09
K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters
Ruize WangDuyu TangNan DuanZhongyu WeiXuanjing HuangJianshu jiGuihong CaoDaxin JiangMing Zhou
2020-02-05
Beat the AI: Investigating Adversarial Human Annotations for Reading Comprehension
Max BartoloAlastair RobertsJohannes WelblSebastian RiedelPontus Stenetorp
2020-02-02
RobBERT: a Dutch RoBERTa-based Language Model
| Pieter DelobelleThomas WintersBettina Berendt
2020-01-17
Resolving the Scope of Speculation and Negation using Transformer-Based Architectures
Benita Kathleen BrittoAditya Khandelwal
2020-01-09
BERT-AL: BERT for Arbitrarily Long Document Understanding
Ruixuan ZhangZhuoyu WeiYu ShiYining Chen
2020-01-01
Fooling Pre-trained Language Models: An Evolutionary Approach to Generate Wrong Sentences with High Acceptability Score
Anonymous
2020-01-01
Generating Biased Datasets for Neural Natural Language Processing
Anonymous
2020-01-01
oLMpics -- On what Language Model Pre-training Captures
Alon TalmorYanai ElazarYoav GoldbergJonathan Berant
2019-12-31
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
James Yi TianAlexander P. KreuzerPai-Hung ChenHans-Martin Will
2019-12-13
BERT has a Moral Compass: Improvements of ethical and moral values of machines
Patrick SchramowskiCigdem TuranSophie JentzschConstantin RothkopfKristian Kersting
2019-12-11
Taking a Stance on Fake News: Towards Automatic Disinformation Assessment via Deep Bidirectional Transformer Language Models for Stance Detection
Chris DulhantyJason L. DeglintIbrahim Ben DayaAlexander Wong
2019-11-27
Evaluating Commonsense in Pre-trained Language Models
| Xuhui ZhouYue ZhangLeyang CuiDandan Huang
2019-11-27
Do Attention Heads in BERT Track Syntactic Dependencies?
Phu Mon HtutJason PhangShikha BordiaSamuel R. Bowman
2019-11-27
What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning
Jaejun LeeRaphael TangJimmy Lin
2019-11-08
Blockwise Self-Attention for Long Document Understanding
Jiezhong QiuHao MaOmer LevyScott Wen-tau YihSinong WangJie Tang
2019-11-07
Deepening Hidden Representations from Pre-trained Language Models
Junjie YangHai Zhao
2019-11-05
[email protected]: Propaganda Detection from News Articles using Transfer Learning
Kartik AggarwalAnubhav Sadana
2019-11-01
When Choosing Plausible Alternatives, Clever Hans can be Clever
Pride KavumbaNaoya InoueBenjamin HeinzerlingKeshav SinghPaul ReisertKentaro Inui
2019-11-01
Masked Language Model Scoring
| Julian SalazarDavis LiangToan Q. NguyenKatrin Kirchhoff
2019-10-31
Transfer Learning from Transformers to Fake News Challenge Stance Detection (FNC-1) Task
Valeriya Slovikovskaya
2019-10-31
Technical report on Conversational Question Answering
Ying JuFubang ZhaoShijie ChenBowen ZhengXuefeng YangYunfeng Liu
2019-09-24
SUPP.AI: Finding Evidence for Supplement-Drug Interactions
Lucy Lu WangOyvind TafjordArman CohanSarthak JainSam SkjonsbergCarissa SchoenickNick BotnerWaleed Ammar
2019-09-17
Frustratingly Easy Natural Question Answering
Lin PanRishav ChakravartiAnthony FerrittoMichael GlassAlfio GliozzoSalim RoukosRadu FlorianAvirup Sil
2019-09-11
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
| Nils ReimersIryna Gurevych
2019-08-27
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
| Sascha RotheShashi NarayanAliaksei Severyn
2019-07-29
RoBERTa: A Robustly Optimized BERT Pretraining Approach
| Yinhan LiuMyle OttNaman GoyalJingfei DuMandar JoshiDanqi ChenOmer LevyMike LewisLuke ZettlemoyerVeselin Stoyanov
2019-07-26

Components

COMPONENT TYPE
BERT
Language Models

Categories