XLNet is an autoregressive Transformer that leverages the best of both autoregressive language modeling and autoencoding while attempting to avoid their limitations. Instead of using a fixed forward or backward factorization order as in conventional autoregressive models, XLNet maximizes the expected log likelihood of a sequence w.r.t. all possible permutations of the factorization order. Thanks to the permutation operation, the context for each position can consist of tokens from both left and right. In expectation, each position learns to utilize contextual information from all positions, i.e., capturing bidirectional context.

Additionally, inspired by the latest advancements in autogressive language modeling, XLNet integrates the segment recurrence mechanism and relative encoding scheme of Transformer-XL into pretraining, which empirically improves the performance especially for tasks involving a longer text sequence.

Source: XLNet: Generalized Autoregressive Pretraining for Language Understanding

Latest Papers

PAPER DATE
AutoMeTS: The Autocomplete for Medical Text Simplification
Hoang VanDavid KauchakGondy Leroy
2020-10-20
[email protected]: Sentiment Analysis of Code-Mixed Dravidian text using XLNet
Shubhanker BanerjeeArun JayapalSajeetha Thavareesan
2020-10-15
Aspect-based Document Similarity for Research Papers
| Malte OstendorffTerry RuasTill BlumeBela GippGeorg Rehm
2020-10-13
Interpreting Attention Models with Human Visual Attention in Machine Reading Comprehension
Ekta SoodSimon TannertDiego FrassinelliAndreas BullingNgoc Thang Vu
2020-10-13
Automated Concatenation of Embeddings for Structured Prediction
| Xinyu WangYong JiangNguyen BachTao WangZhongqiang HuangFei HuangKewei Tu
2020-10-10
Analyzing Individual Neurons in Pre-trained Language Models
Nadir DurraniHassan SajjadFahim DalviYonatan Belinkov
2020-10-06
How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers?
Shayne LongpreYu WangChristopher DuBois
2020-10-05
PUM at SemEval-2020 Task 12: Aggregation of Transformer-based models' features for offensive language recognition
Piotr JaniszewskiMateusz SkibaUrszula Walińska
2020-10-05
Examining the rhetorical capacities of neural language models
Zining ZhuChuer PanMohamed AbdallaFrank Rudzicz
2020-10-01
Accelerating Multi-Model Inference by Merging DNNs of Different Weights
Joo Seong JeongSoojeong KimGyeong-In YuYunseong LeeByung-Gon Chun
2020-09-28
BET: A Backtranslation Approach for Easy Data Augmentation in Transformer-based Paraphrase Identification Context
Jean-Philippe CorbeilHadi Abdi Ghadivel
2020-09-25
Weird AI Yankovic: Generating Parody Lyrics
Mark Riedl
2020-09-25
Fine-tuning Pre-trained Contextual Embeddings for Citation Content Analysis in Scholarly Publication
Haihua ChenHuyen Nguyen
2020-09-12
UPB at SemEval-2020 Task 6: Pretrained Language Models for Definition Extraction
Andrei-Marius AvramDumitru-Clementin CercelCostin-Gabriel Chiru
2020-09-11
Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation
Murad TukanAlaa MaaloufMatan WekslerDan Feldman
2020-09-11
QiaoNing at SemEval-2020 Task 4: Commonsense Validation and Explanation system based on ensemble of language model
Pai Liu
2020-09-06
EdinburghNLP at WNUT-2020 Task 2: Leveraging Transformers with Generalized Augmentation for Identifying Informativeness in COVID-19 Tweets
Nickil Maveli
2020-09-06
A Multitask Deep Learning Approach for User Depression Detection on Sina Weibo
Yiding WangZhenyi WangChenghao LiYilin ZhangHaizhou Wang
2020-08-26
KR-BERT: A Small-Scale Korean-Specific Language Model
| Sangah LeeHansol JangYunmee BaikSuzi ParkHyopil Shin
2020-08-10
Multi-node Bert-pretraining: Cost-efficient Approach
Jiahuang LinXin LiGennady Pekhimenko
2020-08-01
Neural Machine Translation with Error Correction
Kaitao SongXu TanJianfeng Lu
2020-07-21
Detecting Sarcasm in Conversation Context Using Transformer-Based Models
Adithya AvvaruSanath VobilisettyRadhika Mamidi
2020-07-01
Metaphor Detection Using Contextual Word Embeddings From Transformers
Jerry LiuNathan O{'}HaraAlex RubinerRachel DraelosCynthia Rudin
2020-07-01
A Transformer Approach to Contextual Sarcasm Detection in Twitter
Hunter GregorySteven LiPouya MohammadiNatalie TarnRachel DraelosCynthia Rudin
2020-07-01
Transferring Monolingual Model to Low-Resource Language: The Case of Tigrinya
Abrhalei TelaAbraham WoubieVille Hautamaki
2020-06-13
Using Large Pretrained Language Models for Answering User Queries from Product Specifications
Kalyani RoySmit ShahNithish PaiJaidam RamtejPrajit Prashant NadkarnJyotirmoy BanerjeePawan GoyalSurender Kumar
2020-05-29
A Comparative Study of Lexical Substitution Approaches based on Neural Language Models
Nikolay ArefyevBoris SheludkoAlexander PodolskiyAlexander Panchenko
2020-05-29
ImpactCite: An XLNet-based method for Citation Impact Analysis
Dominique MercierSyed Tahseen Raza RizviVikas RajashekarAndreas DengelSheraz Ahmed
2020-05-05
DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering
| Qingqing CaoHarsh TrivediAruna BalasubramanianNiranjan Balasubramanian
2020-05-02
Cross-lingual Information Retrieval with BERT
Zhuolin JiangAmro El-JaroudiWilliam HartmannDamianos KarakosLingjun Zhao
2020-04-24
StereoSet: Measuring stereotypical bias in pretrained language models
| Moin NadeemAnna BethkeSiva Reddy
2020-04-20
MPNet: Masked and Permuted Pre-training for Language Understanding
| Kaitao SongXu TanTao QinJianfeng LuTie-Yan Liu
2020-04-20
Poor Man's BERT: Smaller and Faster Transformer Models
| Hassan SajjadFahim DalviNadir DurraniPreslav Nakov
2020-04-08
Exploiting Redundancy in Pre-trained Language Models for Efficient Transfer Learning
Fahim DalviHassan SajjadNadir DurraniYonatan Belinkov
2020-04-08
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
| Kevin ClarkMinh-Thang LuongQuoc V. LeChristopher D. Manning
2020-03-23
Pairwise Multi-Class Document Classification for Semantic Relations between Wikipedia Articles
| Malte OstendorffTerry RuasMoritz SchubotzGeorg RehmBela Gipp
2020-03-22
Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks
Carlos AspillagaAndrés CarvalloVladimir Araujo
2020-02-14
Resolving the Scope of Speculation and Negation using Transformer-Based Architectures
Benita Kathleen BrittoAditya Khandelwal
2020-01-09
BERT-AL: BERT for Arbitrarily Long Document Understanding
Ruixuan ZhangZhuoyu WeiYu ShiYining Chen
2020-01-01
Clinical XLNet: Modeling Sequential Clinical Notes and Predicting Prolonged Mechanical Ventilation
| Kexin HuangAbhishek SinghSitong ChenEdward T. MoseleyChih-ying DengNaomi GeorgeCharlotta Lindvall
2019-12-27
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
James Yi TianAlexander P. KreuzerPai-Hung ChenHans-Martin Will
2019-12-13
An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic Question Answering
Shayne LongpreYi LuZhucheng TuChris DuBois
2019-12-04
Evaluating Commonsense in Pre-trained Language Models
| Xuhui ZhouYue ZhangLeyang CuiDandan Huang
2019-11-27
Low Rank Factorization for Compact Multi-Head Self-Attention
| Sneha MehtaHuzefa RangwalaNaren Ramakrishnan
2019-11-26
Attending to Entities for Better Text Understanding
Pengxiang ChengKatrin Erk
2019-11-11
IIT-KGP at COIN 2019: Using pre-trained Language Models for modeling Machine Comprehension
Prakhar SharmaSumegh Roychowdhury
2019-11-01
Pingan Smart Health and SJTU at COIN - Shared Task: utilizing Pre-trained Language Models and Common-sense Knowledge in Machine Reading Tasks
Xiepeng LiZhexi ZhangWei ZhuZheng LiYuan NiPeng GaoJunchi YanGuotong Xie
2019-11-01
FASPell: A Fast, Adaptable, Simple, Powerful Chinese Spell Checker Based On DAE-Decoder Paradigm
Yuzhong HongXianguo YuNeng HeNan LiuJunhui Liu
2019-11-01
Generalizing Question Answering System with Pre-trained Language Model Fine-tuning
Dan SuYan XuGenta Indra WinataPeng XuHyeondey KimZihan LiuPascale Fung
2019-11-01
Transfer Learning from Transformers to Fake News Challenge Stance Detection (FNC-1) Task
Valeriya Slovikovskaya
2019-10-31
Modeling Inter-Speaker Relationship in XLNet for Contextual Spoken Language Understanding
Jonggu KimJong-Hyeok Lee
2019-10-28
Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks
Xingchen SongGuangsen WangZhiyong WuYiheng HuangDan SuDong YuHelen Meng
2019-10-23
XL-Editor: Post-editing Sentences with XLNet
Yong-Siang ShihWei-Cheng ChangYiming Yang
2019-10-19
Multilingual Question Answering from Formatted Text applied to Conversational Agents
| Wissam SibliniCharlotte PasqualAxel LavielleCyril Cauchois
2019-10-10
Extreme Language Model Compression with Optimal Subwords and Shared Projections
Sanqiang ZhaoRaghav GuptaYang SongDenny Zhou
2019-09-25
Language models and Automated Essay Scoring
Pedro Uria RodriguezAmir JafariChristopher M. Ormerod
2019-09-18
Frustratingly Easy Natural Question Answering
Lin PanRishav ChakravartiAnthony FerrittoMichael GlassAlfio GliozzoSalim RoukosRadu FlorianAvirup Sil
2019-09-11
Reasoning Over Semantic-Level Graph for Fact Checking
| Wanjun ZhongJingjing XuDuyu TangZenan XuNan DuanMing ZhouJiahai WangJian Yin
2019-09-09
Transfer Learning Robustness in Multi-Class Categorization by Fine-Tuning Pre-Trained Contextualized Language Models
| Xinyi LiuArtit Wangperawong
2019-09-08
Integrating Multimodal Information in Large Pretrained Transformers
Wasifur RahmanMd. Kamrul HasanSangwu LeeAmir ZadehChengfeng MaoLouis-Philippe MorencyEhsan Hoque
2019-08-15
Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding
| Oren BarkanNoam RazinItzik MalkielOri KatzAvi CaciularuNoam Koenigstein
2019-08-14
BioFLAIR: Pretrained Pooled Contextualized Embeddings for Biomedical Sequence Labeling Tasks
| Shreyas SharmaRon Daniel Jr
2019-08-13
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
| Yu SunShuohuan WangYukun LiShikun FengHao TianHua WuHaifeng Wang
2019-07-29
XLNet: Generalized Autoregressive Pretraining for Language Understanding
| Zhilin YangZihang DaiYiming YangJaime CarbonellRuslan SalakhutdinovQuoc V. Le
2019-06-19

Categories