ALBERT is a Transformer architecture based on BERT but with much fewer parameters. It achieves this through two parameter reduction techniques. The first is a factorized embeddings parameterization. By decomposing the large vocabulary embedding matrix into two small matrices, the size of the hidden layers is separated from the size of vocabulary embedding. This makes it easier to grow the hidden size without significantly increasing the parameter size of the vocabulary embeddings. The second technique is cross-layer parameter sharing. This technique prevents the parameter from growing with the depth of the network.

Additionally, ALBERT utilises a self-supervised loss for sentence-order prediction (SOP). SOP primary focuses on inter-sentence coherence and is designed to address the ineffectiveness of the next sentence prediction (NSP) loss proposed in the original BERT.

Source: ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Latest Papers

PAPER DATE
A Transformer Based Pitch Sequence Autoencoder with MIDI Augmentation
Mingshuo DingYinghao Ma
2020-10-15
An Investigation on Different Underlying Quantization Schemes for Pre-trained Language Models
Zihan ZhaoYuncong LiuLu ChenQi LiuRao MaKai Yu
2020-10-14
Infusing Disease Knowledge into BERT for Health Question Answering, Medical Inference and Disease Name Recognition
| Yun HeZiwei ZhuYin ZhangQin ChenJames Caverlee
2020-10-08
Pretrained Language Model Embryology: The Birth of ALBERT
| David C. ChiangSung-Feng HuangHung-Yi Lee
2020-10-06
On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers
| Marius MosbachAnna KhokhlovaMichael A. HedderichDietrich Klakow
2020-10-06
BET: A Backtranslation Approach for Easy Data Augmentation in Transformer-based Paraphrase Identification Context
Jean-Philippe CorbeilHadi Abdi Ghadivel
2020-09-25
BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition
Usman NaseemMatloob KhushiVinay ReddySakthivel RajendranImran RazzakJinman Kim
2020-09-19
Learning Universal Representations from Word to Sentence
Yian LiHai Zhao
2020-09-10
Comparative Study of Language Models on Cross-Domain Data with Model Agnostic Explainability
Mayank ChhipaHrushikesh Mahesh VazurkarAbhijeet KumarMridul Mishra
2020-09-09
ERNIE at SemEval-2020 Task 10: Learning Word Emphasis Selection by Pre-trained Language Model
Zhengjie HuangShikun FengWeiyue SuXuyi ChenShuohuan WangJiaxiang LiuXuan OuyangYu Sun
2020-09-08
UPB at SemEval-2020 Task 8: Joint Textual and Visual Modeling in a Multi-Task Learning Architecture for Memotion Analysis
George-Alexandru VladGeorge-Eduard ZahariaDumitru-Clementin CercelCostin-Gabriel ChiruStefan Trausan-Matu
2020-09-06
Variants of BERT, Random Forests and SVM approach for Multimodal Emotion-Target Sub-challenge
Hoang Manh HungHyung-Jeong YangSoo-Hyung KimGuee-Sang Lee
2020-07-28
Deep Learning Brasil -- NLP at SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets
Manoel Veríssimo dos Santos NetoAyrton Denner da Silva AmaralNádia Félix Felipe da SilvaAnderson da Silva Soares
2020-07-28
Mono vs Multilingual Transformer-based Models: a Comparison across Several Language Tasks
Diego de Vargas FeijoViviane Pereira Moreira
2020-07-19
LMVE at SemEval-2020 Task 4: Commonsense Validation and Explanation using Pretraining Language Model
Shilei LiuYu GuoBochao LiFeiliang Ren
2020-07-06
A Transformer Approach to Contextual Sarcasm Detection in Twitter
Hunter GregorySteven LiPouya MohammadiNatalie TarnRachel DraelosCynthia Rudin
2020-07-01
Deep Investing in Kyle's Single Period Model
Paul FriedrichJosef Teichmann
2020-06-24
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
| Marius MosbachMaksym AndriushchenkoDietrich Klakow
2020-06-08
BERT Loses Patience: Fast and Robust Inference with Early Exit
| Wangchunshu ZhouCanwen XuTao GeJulian McAuleyKe XuFuru Wei
2020-06-07
BERT-based Ensembles for Modeling Disclosure and Support in Conversational Social Media Text
Tanvi DaduKartikey PantRadhika Mamidi
2020-06-01
Language Representation Models for Fine-Grained Sentiment Classification
Brian CheangBailey WeiDavid KoganHowey QiuMasud Ahmed
2020-05-27
Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation
Po-Han ChiPei-Hung ChungTsung-Han WuChun-Cheng HsiehShang-Wen LiHung-yi Lee
2020-05-18
ImpactCite: An XLNet-based method for Citation Impact Analysis
Dominique MercierSyed Tahseen Raza RizviVikas RajashekarAndreas DengelSheraz Ahmed
2020-05-05
TAVAT: Token-Aware Virtual Adversarial Training for Language Understanding
Linyang LiXipeng Qiu
2020-04-30
Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting
| Sanyuan ChenYutai HouYiming CuiWanxiang CheTing LiuXiangzhan Yu
2020-04-27
UHH-LT at SemEval-2020 Task 12: Fine-Tuning of Pre-Trained Transformer Networks for Offensive Language Detection
Gregor WiedemannSeid Muhie YimamChris Biemann
2020-04-23
Investigating the Effectiveness of Representations Based on Pretrained Transformer-based Language Models in Active Learning for Labelling Text Datasets
Jinghui LuBrian MacNamee
2020-04-21
Gestalt: a Stacking Ensemble for SQuAD2.0
Mohamed El-Geish
2020-04-02
Deep Entity Matching with Pre-Trained Language Models
Yuliang LiJinfeng LiYoshihiko SuharaAnHai DoanWang-Chiew Tan
2020-04-01
Retrospective Reader for Machine Reading Comprehension
| Zhuosheng ZhangJunjie YangHai Zhao
2020-01-27
PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination
| Saurabh GoyalAnamitra R. ChoudhurySaurabh M. RajeVenkatesan T. ChakaravarthyYogish SabharwalAshish Verma
2020-01-24
Perceiving the arrow of time in autoregressive motion
Kristof MedingDominik JanzingBernhard SchölkopfFelix A. Wichmann
2019-12-01
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
| Zhenzhong LanMingda ChenSebastian GoodmanKevin GimpelPiyush SharmaRadu Soricut
2019-09-26

Categories