Gated Linear Unit

Introduced by Dauphin et al. in Language Modeling with Gated Convolutional Networks

A Gated Linear Unit, or GLU computes:

$$ \text{GLU}\left(a, b\right) = a\otimes \sigma\left(b\right) $$

It is used in natural language processing architectures, for example the Gated CNN, because here $b$ is the gate that control what information from $a$ is passed up to the following layer. Intuitively, for a language modeling task, the gating mechanism allows selection of words or features that are important for predicting the next word. The GLU also has non-linear capabilities, but has a linear path for the gradient so diminishes the vanishing gradient problem.

Source: Language Modeling with Gated Convolutional Networks

Latest Papers

PAPER DATE
Scientific Claim Verification with VERT5ERINI
Ronak PradeepXueguang MaRodrigo NogueiraJimmy Lin
2020-10-22
mT5: A massively multilingual pre-trained text-to-text transformer
| Linting XueNoah ConstantAdam RobertsMihir KaleRami Al-RfouAditya SiddhantAditya BaruaColin Raffel
2020-10-22
Parameter Norm Growth During Training of Transformers
William MerrillVivek RamanujanYoav GoldbergRoy SchwartzNoah Smith
2020-10-19
Chatbot Interaction with Artificial Intelligence: Human Data Augmentation with T5 and Language Transformer Ensemble for Text Classification
Jordan J. BirdAnikó EkártDiego R. Faria
2020-10-12
TextSETTR: Label-Free Text Style Extraction and Tunable Targeted Restyling
Parker RileyNoah ConstantMandy GuoGirish KumarDavid UthusZarana Parekh
2020-10-08
Converting the Point of View of Messages Spoken to Virtual Assistants
| Isabelle G. LeeVera ZuSai Srujana BuddiDennis LiangJack G. M. FitzGerald
2020-10-06
MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems
| Zhaojiang LinAndrea MadottoGenta Indra WinataPascale Fung
2020-09-25
UCD-CS at W-NUT 2020 Shared Task-3: A Text to Text Approach for COVID-19 Event Extraction on Social Media
| Congcong WangDavid Lillis
2020-09-21
PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data
| Diedre CarmoMarcos PiauIsrael CampiottiRodrigo NogueiraRoberto Lotufo
2020-08-20
Lite Training Strategies for Portuguese-English and English-Portuguese Translation
| Alexandre LopesRodrigo NogueiraRoberto LotufoHelio Pedrini
2020-08-20
Learning from a Complementary-label Source Domain: Theory and Algorithms
Yiyang ZhangFeng LiuZhen FangBo YuanGuangquan ZhangJie Lu
2020-08-04
Clarinet: A One-step Approach Towards Budget-friendly Unsupervised Domain Adaptation
| Yiyang ZhangFeng LiuZhen FangBo YuanGuangquan ZhangJie Lu
2020-07-29
Investigating Pretrained Language Models for Graph-to-Text Generation
Leonardo F. R. RibeiroMartin SchmittHinrich SchützeIryna Gurevych
2020-07-16
Reconstruction Bottlenecks in Object-Centric Generative Models
| Martin EngelckeOiwi Parker JonesIngmar Posner
2020-07-13
HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections
Yi TayZhe ZhaoDara BahriDonald MetzlerDa-Cheng Juan
2020-07-12
Normalizador Neural de Datas e Endereços
Gustavo PlensackPaulo Finardi
2020-06-27
CLARINET: A RISC-V Based Framework for Posit Arithmetic Empiricism
Riya JainNiraj SharmaFarhad MerchantSachin PatkarRainer Leupers
2020-05-30
Text-to-Text Pre-Training for Data-to-Text Tasks
Mihir Kale
2020-05-21
$R^3$: Reverse, Retrieve, and Rank for Sarcasm Generation with Commonsense Knowledge
| Tuhin ChakrabartyDebanjan GhoshSmaranda MuresanNanyun Peng
2020-04-28
Evaluating Machines by their Real-World Language Use
| Rowan ZellersAri HoltzmanElizabeth ClarkLianhui QinAli FarhadiYejin Choi
2020-04-07
TTTTTackling WinoGrande Schemas
Sheng-Chieh LinJheng-Hong YangRodrigo NogueiraMing-Feng TsaiChuan-Ju WangJimmy Lin
2020-03-18
GLU Variants Improve Transformer
| Noam Shazeer
2020-02-12
Parallel Neural Text-to-Speech
Kainan PengWei PingZhao SongKexin Zhao
2020-01-01
Make Lead Bias in Your Favor: Zero-shot Abstractive News Summarization
Chenguang ZhuZiyi YangRobert GmyrMichael ZengXuedong Huang
2019-12-25
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
| Colin RaffelNoam ShazeerAdam RobertsKatherine LeeSharan NarangMichael MatenaYanqi ZhouWei LiPeter J. Liu
2019-10-23
GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations
| Martin EngelckeAdam R. KosiorekOiwi Parker JonesIngmar Posner
2019-07-30
Multi-Speaker End-to-End Speech Synthesis
Jihyun ParkKexin ZhaoKainan PengWei Ping
2019-07-09
Learnable Gated Temporal Shift Module for Deep Video Inpainting
| Ya-Liang ChangZhe Yu LiuKuan-Ying LeeWinston Hsu
2019-07-02
Non-Autoregressive Neural Text-to-Speech
| Kainan PengWei PingZhao SongKexin Zhao
2019-05-21
Neural source-filter waveform models for statistical parametric speech synthesis
Xin WangShinji TakakiJunichi Yamagishi
2019-04-27
AlphaStar: An Evolutionary Computation Perspective
| Kai ArulkumaranAntoine CullyJulian Togelius
2019-02-05
Pay Less Attention with Lightweight and Dynamic Convolutions
| Felix WuAngela FanAlexei BaevskiYann N. DauphinMichael Auli
2019-01-29
FloWaveNet : A Generative Flow for Raw Audio
Sungwon KimSang-gil LeeJongyoon SongSungroh Yoon
2018-11-06
ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech
| Wei PingKainan PengJitong Chen
2018-07-19
Free-Form Image Inpainting with Gated Convolution
| Jiahui YuZhe LinJimei YangXiaohui ShenXin LuThomas Huang
2018-06-10
Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning
| Wei PingKainan PengAndrew GibianskySercan O. ArikAjay KannanSharan NarangJonathan RaimanJohn Miller
2017-10-20
Language Modeling with Gated Convolutional Networks
| Yann N. DauphinAngela FanMichael AuliDavid Grangier
2016-12-23

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories