Weight Tying

Introduced by Press et al. in Using the Output Embedding to Improve Language Models

Weight Tying shares the weights between an embedding and softmax layer, reducing the total parameter count in the natural language processing model. The technique prevents the model from having to learn a one-to-one correspondence between the input and output, improving the standard LSTM language model. In the formulation of Inan et al, this is achieved through an additional term in the loss function that minimizes the KL divergence between the predictive distribution and an estimate of the true data distribution. Given $y_{t}$ for the $t^{\text{th}}$ example, for $y_{t}$ and $y^{*}_{t}$ we have:

$$ J_{t} = D_{\text{KL}}\left(y^{*}_{t}\Vert{y_{t}}\right) $$

Source: Merity et al, Regularizing and Optimizing LSTM Models

This method was independently introduced by Press et al and Inan et al.

Source: Using the Output Embedding to Improve Language Models

Latest Papers

PAPER DATE
Pagsusuri ng RNN-based Transfer Learning Technique sa Low-Resource Language
| Dan John Velasco
2020-10-13
[email protected]: Pre-training ULMFiT on Synthetically Generated Code-Mixed Data for Hate Speech Detection
Gaurav Arora
2020-10-05
Fine-tuning Pre-trained Contextual Embeddings for Citation Content Analysis in Scholarly Publication
Haihua ChenHuyen Nguyen
2020-09-12
HinglishNLP: Fine-tuned Language Models for Hinglish Sentiment Detection
Meghana BhangeNirant Kasliwal
2020-08-22
Composer Style Classification of Piano Sheet Music Images Using Language Model Pretraining
TJ TsaiKevin Ji
2020-07-29
Probing for Referential Information in Language Models
Ionut-Teodor SorodocKristina GulordavaGemma Boleda
2020-07-01
Text Categorization for Conflict Event Annotation
Fredrik OlssonMagnus SahlgrenFehmi ben AbdesslemAriel EkgrenKristine Eck
2020-05-01
Offensive language detection in Arabic using ULMFiT
Mohamed AbdellatifAhmed Elgammal
2020-05-01
Evaluation Metrics for Headline Generation Using Deep Pre-Trained Embeddings
Abdul MoeedYang AnGerhard HagererGeorg Groh
2020-05-01
Inferring the source of official texts: can SVM beat ULMFiT?
| Pedro Henrique Luz de AraujoTeófilo Emidio de CamposMarcelo Magalhães Silva de Sousa
2020-03-02
MaxUp: A Simple Way to Improve Generalization of Neural Network Training
Chengyue GongTongzheng RenMao YeQiang Liu
2020-02-20
Localized Flood DetectionWith Minimal Labeled Social Media Data Using Transfer Learning
Neha SinghNirmalya RoyAryya Gangopadhyay
2020-02-10
Natural language processing of MIMIC-III clinical notes for identifying diagnosis and procedures with neural networks
Siddhartha NuthakkiSunil NeelaJudy W. GichoyaSaptarshi Purkayastha
2019-12-28
A Comparative Study of Pretrained Language Models on Thai Social Text Categorization
Thanapapas HorsuwanKasidis KanwatcharaPeerapon VateekulBoonserm Kijsirikul
2019-12-03
DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling
Sachin MehtaRik Koncel-KedziorskiMohammad RastegariHannaneh Hajishirzi
2019-11-27
A Subword Level Language Model for Bangla Language
Aisha KhatunAnisur RahmanHemayet Ahmed ChowdhuryMd. Saiful IslamAyesha Tasnim
2019-11-15
Evolution of transfer learning in natural language processing
Aditya MaltePratik Ratadiya
2019-10-16
The merits of Universal Language Model Fine-tuning for Small Datasets -- a case with Dutch book reviews
Benjamin van der BurghSuzan Verberne
2019-10-02
Analyzing Customer Feedback for Product Fit Prediction
Stephan Baier
2019-08-28
Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks
Maxim KodryanArtem GrachevDmitry IgnatovDmitry Vetrov
2019-08-01
Representation Degeneration Problem in Training Natural Language Generation Models
Jun GaoDi HeXu TanTao QinLiwei WangTie-Yan Liu
2019-07-28
Low-Shot Classification: A Comparison of Classical and Deep Transfer Machine Learning Approaches
Peter UsherwoodSteven Smit
2019-07-17
Evaluating Language Model Finetuning Techniques for Low-resource Languages
| Jan Christian Blaise CruzCharibeth Cheng
2019-06-30
Exploiting Unsupervised Pre-training and Automated Feature Engineering for Low-resource Hate Speech Detection in Polish
Renard KorzeniowskiRafał RolczyńskiPrzemysław SadownikTomasz KorbakMarcin Możejko
2019-06-17
Speak up, Fight Back! Detection of Social Media Disclosures of Sexual Harassment
| Arijit Ghosh ChowdhuryRamit SawhneyPuneet MathurDebanjan MahataRajiv Ratn Shah
2019-06-01
Figure Eight at SemEval-2019 Task 3: Ensemble of Transfer Learning Methods for Contextual Emotion Detection
Joan Xiao
2019-06-01
An Empirical Evaluation of Text Representation Schemes on Multilingual Social Web to Filter the Textual Aggression
Sandip ModhaPrasenjit Majumder
2019-04-16
Low Resource Text Classification with ULMFit and Backtranslation
Sam Shleifer
2019-03-21
Trellis Networks for Sequence Modeling
| Shaojie BaiJ. Zico KolterVladlen Koltun
2018-10-15
Language Informed Modeling of Code-Switched Text
ChKhyathi uThomas ManziniSumeet SinghAlan W. Black
2018-07-01
Universal Language Model Fine-tuning for Text Classification
| Jeremy HowardSebastian Ruder
2018-01-18
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
| Zhilin YangZihang DaiRuslan SalakhutdinovWilliam W. Cohen
2017-11-10
Regularizing and Optimizing LSTM Language Models
| Stephen MerityNitish Shirish KeskarRichard Socher
2017-08-07
The University of Edinburgh's Neural MT Systems for WMT17
Rico SennrichAlexandra BirchAnna CurreyUlrich GermannBarry HaddowKenneth HeafieldAntonio Valerio Miceli BaronePhilip Williams
2017-08-02
Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling
| Hakan InanKhashayar KhosraviRichard Socher
2016-11-04
Using the Output Embedding to Improve Language Models
| Ofir PressLior Wolf
2016-08-20

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories