Variational Dropout is a regularization technique based on dropout, but uses a variational inference grounded approach. In Variational Dropout, we repeat the same dropout mask at each time step for both inputs, outputs, and recurrent layers (drop the same network units at each time step). This is in contrast to ordinary Dropout where different dropout masks are sampled at each time step for the inputs and outputs alone.

Source: A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

Latest Papers

PAPER DATE
Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset
Xie ChenYu WuZhenghao WangShujie LiuJinyu Li
2020-10-22
Memformer: The Memory-Augmented Transformer
Qingyang WuZhenzhong LanJing GuZhou Yu
2020-10-14
Pagsusuri ng RNN-based Transfer Learning Technique sa Low-Resource Language
| Dan John Velasco
2020-10-13
[email protected]: Pre-training ULMFiT on Synthetically Generated Code-Mixed Data for Hate Speech Detection
Gaurav Arora
2020-10-05
Fine-tuning Pre-trained Contextual Embeddings for Citation Content Analysis in Scholarly Publication
Haihua ChenHuyen Nguyen
2020-09-12
Pay Attention when Required
Swetha MandavaSzymon MigaczAlex Fit Florea
2020-09-09
HinglishNLP: Fine-tuned Language Models for Hinglish Sentiment Detection
Meghana BhangeNirant Kasliwal
2020-08-22
The Jazz Transformer on the Front Line: Exploring the Shortcomings of AI-composed Music through Quantitative Measures
| Shih-Lun WuYi-Hsuan Yang
2020-08-04
Automatic Composition of Guitar Tabs by Transformers and Groove Modeling
Yu-Hua ChenYu-Hsiang HuangWen-Yi HsiaoYi-Hsuan Yang
2020-08-04
Language Modelling for Source Code with Transformer-XL
| Thomas DowdellHongyu Zhang
2020-07-31
Composer Style Classification of Piano Sheet Music Images Using Language Model Pretraining
TJ TsaiKevin Ji
2020-07-29
VINNAS: Variational Inference-based Neural Network Architecture Search
Martin FeriancHongxiang FanMiguel Rodrigues
2020-07-12
Do Transformers Need Deep Long-Range Memory
Jack W. RaeAli Razavi
2020-07-07
Do Transformers Need Deep Long-Range Memory?
Jack RaeAli Razavi
2020-07-01
Probing for Referential Information in Language Models
Ionut-Teodor SorodocKristina GulordavaGemma Boleda
2020-07-01
Mind The Facts: Knowledge-Boosted Coherent Abstractive Text Summarization
Beliz GunelChenguang ZhuMichael ZengXuedong Huang
2020-06-27
Exploring Transformers for Large-Scale Speech Recognition
Liang LuChangliang LiuJinyu LiYifan Gong
2020-05-19
Improving Neural Language Generation with Spectrum Control
Lingxiao WangJing HuangKevin HuangZiniu HuGuangtao WangQuanquan Gu
2020-05-01
Text Categorization for Conflict Event Annotation
Fredrik OlssonMagnus SahlgrenFehmi ben AbdesslemAriel EkgrenKristine Eck
2020-05-01
Offensive language detection in Arabic using ULMFiT
Mohamed AbdellatifAhmed Elgammal
2020-05-01
Evaluation Metrics for Headline Generation Using Deep Pre-Trained Embeddings
Abdul MoeedYang AnGerhard HagererGeorg Groh
2020-05-01
Bayesian Sparsification Methods for Deep Complex-valued Networks
| Ivan NazarovEvgeny Burnaev
2020-03-25
Finnish Language Modeling with Deep Transformer Models
Abhilash JainAku RuoheStig-Arne GrönroosMikko Kurimo
2020-03-14
Adaptive Neural Connections for Sparsity Learning
Prakhar KaushikAlex GainHava Siegelmann
2020-03-05
Inferring the source of official texts: can SVM beat ULMFiT?
| Pedro Henrique Luz de AraujoTeófilo Emidio de CamposMarcelo Magalhães Silva de Sousa
2020-03-02
MaxUp: A Simple Way to Improve Generalization of Neural Network Training
Chengyue GongTongzheng RenMao YeQiang Liu
2020-02-20
Localized Flood DetectionWith Minimal Labeled Social Media Data Using Transfer Learning
Neha SinghNirmalya RoyAryya Gangopadhyay
2020-02-10
Variational Dropout Sparsification for Particle Identification speed-up
Artem RyzhikovDenis DerkachMikhail Hushchyn
2020-01-21
Delving Deeper into the Decoder for Video Captioning
| Haoran ChenJianmin LiXiaolin Hu
2020-01-16
DeFINE: Deep Factorized Input Word Embeddings for Neural Sequence Modeling
Anonymous
2020-01-01
Improving Neural Language Generation with Spectrum Control
Anonymous
2020-01-01
Natural language processing of MIMIC-III clinical notes for identifying diagnosis and procedures with neural networks
Siddhartha NuthakkiSunil NeelaJudy W. GichoyaSaptarshi Purkayastha
2019-12-28
A Comparative Study of Pretrained Language Models on Thai Social Text Categorization
Thanapapas HorsuwanKasidis KanwatcharaPeerapon VateekulBoonserm Kijsirikul
2019-12-03
Neural Academic Paper Generation
| Samet DemirUras MutluÖzgur Özdemir
2019-12-02
DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling
Sachin MehtaRik Koncel-KedziorskiMohammad RastegariHannaneh Hajishirzi
2019-11-27
Improving Polyphonic Music Models with Feature-Rich Encoding
| Omar Peracha
2019-11-26
A Subword Level Language Model for Bangla Language
Aisha KhatunAnisur RahmanHemayet Ahmed ChowdhuryMd. Saiful IslamAyesha Tasnim
2019-11-15
Compressive Transformers for Long-Range Sequence Modelling
| Jack W. RaeAnna PotapenkoSiddhant M. JayakumarTimothy P. Lillicrap
2019-11-13
Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model
Oleksii HrinchukMariya PopovaBoris Ginsburg
2019-10-23
Evolution of transfer learning in natural language processing
Aditya MaltePratik Ratadiya
2019-10-16
Stabilizing Transformers for Reinforcement Learning
| Emilio ParisottoH. Francis SongJack W. RaeRazvan PascanuCaglar GulcehreSiddhant M. JayakumarMax JaderbergRaphael Lopez KaufmanAidan ClarkSeb NouryMatthew M. BotvinickNicolas HeessRaia Hadsell
2019-10-13
The merits of Universal Language Model Fine-tuning for Small Datasets -- a case with Dutch book reviews
Benjamin van der BurghSuzan Verberne
2019-10-02
GDP: Generalized Device Placement for Dataflow Graphs
Yanqi ZhouSudip RoyAmirali AbdolrashidiDaniel WongPeter C. MaQiumin XuMing ZhongHanxiao LiuAnna GoldieAzalia MirhoseiniJames Laudon
2019-09-28
A Self-Attentional Neural Architecture for Code Completion with Multi-Task Learning
Fang LiuGe LiBolin WeiXin XiaZhiyi FuZhi Jin
2019-09-16
Ouroboros: On Accelerating Training of Transformer-Based Language Models
| Qian YangZhouyuan HuoWenlin WangHeng HuangLawrence Carin
2019-09-14
Analyzing Customer Feedback for Product Fit Prediction
Stephan Baier
2019-08-28
Low-Shot Classification: A Comparison of Classical and Deep Transfer Machine Learning Approaches
Peter UsherwoodSteven Smit
2019-07-17
Evaluating Language Model Finetuning Techniques for Low-resource Languages
| Jan Christian Blaise CruzCharibeth Cheng
2019-06-30
A Tensorized Transformer for Language Modeling
Xindian MaPeng ZhangShuai ZhangNan DuanYuexian HouDawei SongMing Zhou
2019-06-24
Posterior-Guided Neural Architecture Search
Yizhou ZhouXiaoyan SunChong LuoZheng-Jun ZhaWenjun Zeng
2019-06-23
XLNet: Generalized Autoregressive Pretraining for Language Understanding
| Zhilin YangZihang DaiYiming YangJaime CarbonellRuslan SalakhutdinovQuoc V. Le
2019-06-19
Exploiting Unsupervised Pre-training and Automated Feature Engineering for Low-resource Hate Speech Detection in Polish
Renard KorzeniowskiRafał RolczyńskiPrzemysław SadownikTomasz KorbakMarcin Możejko
2019-06-17
Speak up, Fight Back! Detection of Social Media Disclosures of Sexual Harassment
| Arijit Ghosh ChowdhuryRamit SawhneyPuneet MathurDebanjan MahataRajiv Ratn Shah
2019-06-01
Figure Eight at SemEval-2019 Task 3: Ensemble of Transfer Learning Methods for Contextual Emotion Detection
Joan Xiao
2019-06-01
Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)
| Mariya TonevaLeila Wehbe
2019-05-28
Transformer-XL: Language Modeling with Longer-Term Dependency
Zihang Dai*Zhilin Yang*Yiming YangWilliam W. CohenJaime CarbonellQuoc V. LeRuslan Salakhutdinov
2019-05-01
An Empirical Evaluation of Text Representation Schemes on Multilingual Social Web to Filter the Textual Aggression
Sandip ModhaPrasenjit Majumder
2019-04-16
Low Resource Text Classification with ULMFit and Backtranslation
Sam Shleifer
2019-03-21
How Large a Vocabulary Does Text Classification Need? A Variational Approach to Vocabulary Selection
| Wenhu ChenYu SuYilin ShenZhiyu ChenXifeng YanWilliam Wang
2019-02-27
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
| Zihang DaiZhilin YangYiming YangJaime CarbonellQuoc V. LeRuslan Salakhutdinov
2019-01-09
Analysing Dropout and Compounding Errors in Neural Language Models
James O' NeillDanushka Bollegala
2018-11-02
Variational Dropout via Empirical Bayes
Valery KharitonovDmitry MolchanovDmitry Vetrov
2018-11-01
Language Informed Modeling of Code-Switched Text
ChKhyathi uThomas ManziniSumeet SinghAlan W. Black
2018-07-01
Adaptive Network Sparsification with Dependent Variational Beta-Bernoulli Dropout
| Juho LeeSaehoon KimJaehong YoonHae Beom LeeEunho YangSung Ju Hwang
2018-05-28
Universal Language Model Fine-tuning for Text Classification
| Jeremy HowardSebastian Ruder
2018-01-18
Differentially Private Variational Dropout
Beyza ErmisAli Taylan Cemgil
2017-11-30
Improved Bayesian Compression
Marco FedericiKaren UllrichMax Welling
2017-11-17
Alpha-Divergences in Variational Dropout
Bogdan MazoureRiashat Islam
2017-11-12
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
| Zhilin YangZihang DaiRuslan SalakhutdinovWilliam W. Cohen
2017-11-10
Regularizing and Optimizing LSTM Language Models
| Stephen MerityNitish Shirish KeskarRichard Socher
2017-08-07
Bayesian Sparsification of Recurrent Neural Networks
| Ekaterina LobachevaNadezhda ChirkovaDmitry Vetrov
2017-07-31
Variational Dropout Sparsifies Deep Neural Networks
| Dmitry MolchanovArsenii AshukhaDmitry Vetrov
2017-01-19
Pointer Sentinel Mixture Models
| Stephen MerityCaiming XiongJames BradburyRichard Socher
2016-09-26
Multiplicative LSTM for sequence modelling
Ben KrauseLiang LuIain MurraySteve Renals
2016-09-26
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
| Yarin GalZoubin Ghahramani
2015-12-16

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories