General Language Understanding Evaluation (GLUE) benchmark is a collection of nine natural language understanding tasks, including single-sentence tasks CoLA and SST-2, similarity and paraphrasing tasks MRPC, STS-B and QQP, and natural language inference tasks MNLI, QNLI, RTE and WNLI.
2,735 PAPERS • 25 BENCHMARKS
The Stanford Sentiment Treebank is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each annotated by 3 human judges.
2,035 PAPERS • 9 BENCHMARKS
Microsoft Research Paraphrase Corpus (MRPC) is a corpus consists of 5,801 sentence pairs collected from newswire articles. Each pair is labelled if it is a paraphrase or not by human annotators. The whole set is divided into a training subset (4,076 sentence pairs of which 2,753 are paraphrases) and a test subset (1,725 pairs of which 1,147 are paraphrases).
699 PAPERS • 5 BENCHMARKS
CaseHOLD (Case Holdings On Legal Decisions) is a law dataset comprised of over 53,000+ multiple choice questions to identify the relevant holding of a cited case. This dataset presents a fundamental task to lawyers and is both legally meaningful and difficult from an NLP perspective (F1 of 0.4 with a BiLSTM baseline). The citing context from the judicial decision serves as the prompt for the question. The answer choices are holding statements derived from citations following text in a legal decision. There are five answer choices for each citing text. The correct answer is the holding statement that corresponds to the citing text. The four incorrect answers are other holding statements.
23 PAPERS • 2 BENCHMARKS
This dataset is a Wikipedia dump, split by relations to perform Few-Shot Knowledge Graph Completion.
15 PAPERS • NO BENCHMARKS YET
MedConceptsQA - Open Source Medical Concepts QA Benchmark
12 PAPERS • 2 BENCHMARKS
The Few-Shot Object Learning (FewSOL) dataset can be used for object recognition with a few images per object. It contains 336 real-world objects with 9 RGB-D images per object from different views. Object segmentation masks, object poses and object attributes are provided. In addition, synthetic images generated using 330 3D object models are used to augment the dataset. FewSOL dataset can be used to study a set of few-shot object recognition problems such as classification, detection and segmentation, shape reconstruction, pose estimation, keypoint correspondences and attribute recognition.
4 PAPERS • NO BENCHMARKS YET
Contains 3,689,229 English news articles on politics, gathered from 11 United States (US) media outlets covering a broad ideological spectrum.
2 PAPERS • NO BENCHMARKS YET
Millions of people around the world have low or no vision. Assistive software applications have been developed for a variety of day-to-day tasks, including currency recognition. To aid with this task, we present BankNote-Net, an open dataset for assistive currency recognition. The dataset consists of a total of 24,816 embeddings of banknote images captured in a variety of assistive scenarios, spanning 17 currencies and 112 denominations. These compliant embeddings were learned using supervised contrastive learning and a MobileNetV2 architecture, and they can be used to train and test specialized downstream models for any currency, including those not covered by our dataset or for which only a few real images per denomination are available (few-shot learning). We deploy a variation of this model for public use in the last version of the Seeing AI app developed by Microsoft, which has over a 100 thousand monthly active users.
1 PAPER • NO BENCHMARKS YET
Bongard-OpenWorld is a new benchmark for evaluating real-world few-shot reasoning for machine vision. We hope it can help us better understand the limitations of current visual intelligence and facilitate future research on visual agents with stronger few-shot visual reasoning capabilities.
1 PAPER • 1 BENCHMARK
Introduction The FewGLUE_64_labeled dataset is a new version of FewGLUE dataset. It contains a 64-sample training set, a development set (the original SuperGLUE development set), a test set, and an unlabeled set. It is constructed to facilitate the research of few-shot learning for natural language understanding tasks.
A dataset specifically tailored to the biotech news sector, aiming to transcend the limitations of existing benchmarks. This dataset is rich in complex content, comprising various biotech news articles covering various events, thus providing a more nuanced view of information extraction challenges.
0 PAPER • NO BENCHMARKS YET