DistilBERT is a small, fast, cheap and light Transformer model based on the BERT architecture. Knowledge distillation is performed during the pre-training phase to reduce the size of a BERT model by 40%. To leverage the inductive biases learned by larger models during pre-training, the authors introduce a triple loss combining language modeling, distillation and cosine-distance losses.

Source: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Latest Papers

PAPER DATE
Probing for Multilingual Numerical Understanding in Transformer-Based Language Models
| Devin JohnsonDenise MakDrew BarkerLexi Loessberg-Zahl
2020-10-13
Chatbot Interaction with Artificial Intelligence: Human Data Augmentation with T5 and Language Transformer Ensemble for Text Classification
Jordan J. BirdAnikó EkártDiego R. Faria
2020-10-12
Compressing Transformer-Based Semantic Parsing Models using Compositional Code Embeddings
Prafull PrakashSaurabh Kumar ShashidharWenlong ZhaoSubendhu RongaliHaidar KhanMichael Kayser
2020-10-10
Deep Learning Meets Projective Clustering
Alaa MaaloufHarry LangDaniela RusDan Feldman
2020-10-08
Optimizing Transformers with Approximate Computing for Faster, Smaller and more Accurate NLP Models
Amrit NagarajanSanchari SenJacob R. StevensAnand Raghunathan
2020-10-07
Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model Adaptation
| Minki KangMoonsu HanSung Ju Hwang
2020-10-06
Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning
Bingbing LiZhenglun KongTianyun ZhangJi LiZhengang LiHang LiuCaiwen Ding
2020-09-17
Compositional and Lexical Semantics in RoBERTa, BERT and DistilBERT: A Case Study on CoQA
Ieva StaliūnaitėIgnacio Iacobacci
2020-09-17
Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation
Murad TukanAlaa MaaloufMatan WekslerDan Feldman
2020-09-11
Comparative Study of Language Models on Cross-Domain Data with Model Agnostic Explainability
Mayank ChhipaHrushikesh Mahesh VazurkarAbhijeet KumarMridul Mishra
2020-09-09
Sentimental LIAR: Extended Corpus and Deep Learning Models for Fake Claim Classification
Bibek UpadhayayVahid Behzadan
2020-09-01
Cooking Is All About People: Comment Classification On Cookery Channels Using BERT and Classification Models (Malayalam-English Mix-Code)
Subramaniam KazhuparambilAbhishek Kaushik
2020-06-15
Accelerating Natural Language Understanding in Task-Oriented Dialog
Ojas AhujaShrey Desai
2020-06-05
LRG at SemEval-2020 Task 7: Assessing the Ability of BERT and Derivative Models to Perform Short-Edits based Humor Grading
Siddhant MahurkarRajaswa Patil
2020-05-31
Language Representation Models for Fine-Grained Sentiment Classification
Brian CheangBailey WeiDavid KoganHowey QiuMasud Ahmed
2020-05-27
Establishing Baselines for Text Classification in Low-Resource Languages
| Jan Christian Blaise CruzCharibeth Cheng
2020-05-05
Analyzing ELMo and DistilBERT on Socio-political News Classification
Berfu B{\"u}y{\"u}k{\"o}zAli H{\"u}rriyeto{\u{g}}luArzucan {\"O}zg{\"u}r
2020-05-01
Poor Man's BERT: Smaller and Faster Transformer Models
| Hassan SajjadFahim DalviNadir DurraniPreslav Nakov
2020-04-08
Improved Pretraining for Domain-specific Contextual Embedding Models
Subendhu RongaliAbhyuday JagannathaBhanu Pratap Singh RawatHong Yu
2020-04-05
Deep Entity Matching with Pre-Trained Language Models
Yuliang LiJinfeng LiYoshihiko SuharaAnHai DoanWang-Chiew Tan
2020-04-01
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
| Victor SanhLysandre DebutJulien ChaumondThomas Wolf
2019-10-02

Components

COMPONENT TYPE
BERT
Language Models

Categories