T5, or Text-to-Text Transfer Transformer, is a Transformer based architecture that uses a text-to-text approach. Every task – including translation, question answering, and classification – is cast as feeding the model text as input and training it to generate some target text. This allows for the use of the same model, loss function, hyperparameters, etc. across our diverse set of tasks. The changes compared to BERT include:

  • adding a causal decoder to the bidirectional architecture.
  • replacing the fill-in-the-blank cloze task with a mix of alternative pre-training tasks.
Source: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Latest Papers

PAPER DATE
Parameter Norm Growth During Training of Transformers
William MerrillVivek RamanujanYoav GoldbergRoy SchwartzNoah Smith
2020-10-19
Chatbot Interaction with Artificial Intelligence: Human Data Augmentation with T5 and Language Transformer Ensemble for Text Classification
Jordan J. BirdAnikó EkártDiego R. Faria
2020-10-12
TextSETTR: Label-Free Text Style Extraction and Tunable Targeted Restyling
Parker RileyNoah ConstantMandy GuoGirish KumarDavid UthusZarana Parekh
2020-10-08
Converting the Point of View of Messages Spoken to Virtual Assistants
| Isabelle G. LeeVera ZuSai Srujana BuddiDennis LiangJack G. M. FitzGerald
2020-10-06
MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems
Zhaojiang LinAndrea MadottoGenta Indra WinataPascale Fung
2020-09-25
UCD-CS at W-NUT 2020 Shared Task-3: A Text to Text Approach for COVID-19 Event Extraction on Social Media
Congcong WangDavid Lillis
2020-09-21
PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data
| Diedre CarmoMarcos PiauIsrael CampiottiRodrigo NogueiraRoberto Lotufo
2020-08-20
Lite Training Strategies for Portuguese-English and English-Portuguese Translation
Alexandre LopesRodrigo NogueiraRoberto LotufoHelio Pedrini
2020-08-20
Investigating Pretrained Language Models for Graph-to-Text Generation
Leonardo F. R. RibeiroMartin SchmittHinrich SchützeIryna Gurevych
2020-07-16
HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections
Yi TayZhe ZhaoDara BahriDonald MetzlerDa-Cheng Juan
2020-07-12
Normalizador Neural de Datas e Endereços
Gustavo PlensackPaulo Finardi
2020-06-27
Text-to-Text Pre-Training for Data-to-Text Tasks
| Mihir Kale
2020-05-21
$R^3$: Reverse, Retrieve, and Rank for Sarcasm Generation with Commonsense Knowledge
| Tuhin ChakrabartyDebanjan GhoshSmaranda MuresanNanyun Peng
2020-04-28
Evaluating Machines by their Real-World Language Use
| Rowan ZellersAri HoltzmanElizabeth ClarkLianhui QinAli FarhadiYejin Choi
2020-04-07
TTTTTackling WinoGrande Schemas
Sheng-Chieh LinJheng-Hong YangRodrigo NogueiraMing-Feng TsaiChuan-Ju WangJimmy Lin
2020-03-18
Make Lead Bias in Your Favor: Zero-shot Abstractive News Summarization
Chenguang ZhuZiyi YangRobert GmyrMichael ZengXuedong Huang
2019-12-25
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
| Colin RaffelNoam ShazeerAdam RobertsKatherine LeeSharan NarangMichael MatenaYanqi ZhouWei LiPeter J. Liu
2019-10-23

Categories