( Image credit: SQuAD )
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
We introduce "talking-heads attention" - a variation on multi-head attention which includes linearprojections across the attention-heads dimension, immediately before and after the softmax operation. While inserting only a small number of additional parameters and a moderate amount of additionalcomputation, talking-heads attention leads to better perplexities on masked language modeling tasks, aswell as better quality when transfer-learning to language comprehension and question answering tasks.
Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks.
SOTA for Natural Language Inference on QNLI
We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.
SOTA for Common Sense Reasoning on SWAG
Community Question-Answering websites, such as StackOverflow and Quora, expect users to follow specific guidelines in order to maintain content quality.
These approaches corrupt the input by replacing some tokens with [MASK] and then train a model to reconstruct the original tokens.
#3 best model for Semantic Textual Similarity on STS Benchmark
We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token.
#9 best model for Question Answering on SQuAD1.1 dev (F1 metric)
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP).
SOTA for Linguistic Acceptability on CoLA
COMMON SENSE REASONING COREFERENCE RESOLUTION DOCUMENT SUMMARIZATION LINGUISTIC ACCEPTABILITY MACHINE TRANSLATION NATURAL LANGUAGE INFERENCE QUESTION ANSWERING SEMANTIC TEXTUAL SIMILARITY SENTIMENT ANALYSIS TEXT CLASSIFICATION TRANSFER LEARNING WORD SENSE DISAMBIGUATION
As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging.
#5 best model for Semantic Textual Similarity on MRPC
Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.
#2 best model for Natural Language Inference on QNLI
With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling.
SOTA for Text Classification on DBpedia