16 dataset results for Image Classification AND Texts

This is an evaluation harness for the HumanEval problem solving dataset described in the paper "Evaluating Large Language Models Trained on Code". It used to measure functional correctness for synthesizing programs from docstrings. It consists of 164 original programming problems, assessing language comprehension, algorithms, and simple mathematics, with some comparable to simple software interview questions.

460 PAPERS • 1 BENCHMARK

MINC (Materials in Context Database)

MINC is a large-scale, open dataset of materials in the wild.

53 PAPERS • NO BENCHMARKS YET

WOS

WOS (Web of Science Dataset)

Web of Science (WOS) is a document classification dataset that contains 46,985 documents with 134 categories which include 7 parents categories.

48 PAPERS • 3 BENCHMARKS

ImageNet-P

ImageNet-P consists of noise, blur, weather, and digital distortions. The dataset has validation perturbations; has difficulty levels; has CIFAR-10, Tiny ImageNet, ImageNet 64 × 64, standard, and Inception-sized editions; and has been designed for benchmarking not training networks. ImageNet-P departs from ImageNet-C by having perturbation sequences generated from each ImageNet validation image. Each sequence contains more than 30 frames, so to counteract an increase in dataset size and evaluation time only 10 common perturbations are used.

28 PAPERS • 1 BENCHMARK

ELEVATER

ELEVATER (Evaluation of Language-augmented Visual Task-level Transfer)

The ELEVATER benchmark is a collection of resources for training, evaluating, and analyzing language-image models on image classification and object detection. ELEVATER consists of:

22 PAPERS • 2 BENCHMARKS

ETHOS

ETHOS (multi-labEl haTe speecH detectiOn dataSet)

ETHOS is a hate speech detection dataset. It is built from YouTube and Reddit comments validated through a crowdsourcing platform. It has two subsets, one for binary classification and the other for multi-label classification. The former contains 998 comments, while the latter contains fine-grained hate-speech annotations for 433 comments.

17 PAPERS • 2 BENCHMARKS

CI-MNIST (Correlated and Imbalanced MNIST) is a variant of MNIST dataset with introduced different types of correlations between attributes, dataset features, and an artificial eligibility criterion. For an input image $x$, the label $y \in \{1, 0\}$ indicates eligibility or ineligibility, respectively, given that $x$ is even or odd. The dataset defines the background colors as the protected or sensitive attribute $s \in \{0, 1\}$, where blue denotes the unprivileged group and red denotes the privileged group. The dataset was designed in order to evaluate bias-mitigation approaches in challenging setups and be capable of controlling different dataset configurations.

4 PAPERS • NO BENCHMARKS YET

MuMiN

MuMiN is a misinformation graph dataset containing rich social media data (tweets, replies, users, images, articles, hashtags), spanning 21 million tweets belonging to 26 thousand Twitter threads, each of which have been semantically linked to 13 thousand fact-checked claims across dozens of topics, events and domains, in 41 different languages, spanning more than a decade.

4 PAPERS • 3 BENCHMARKS

BCNB (Early Breast Cancer Core-Needle Biopsy WSI)

Breast cancer (BC) has become the greatest threat to women’s health worldwide. Clinically, identification of axillary lymph node (ALN) metastasis and other tumor clinical characteristics such as ER, PR, and so on, are important for evaluating the prognosis and guiding the treatment for BC patients.

3 PAPERS • NO BENCHMARKS YET

Open Images V7

Open Images is a computer vision dataset covering ~9 million images with labels spanning thousands of object categories. A subset of 1.9M includes diverse annotations types.

3 PAPERS • NO BENCHMARKS YET

ACL-Fig

ACL-Fig is a large-scale automatically annotated corpus consisting of 112,052 scientific figures extracted from 56K research papers in the ACL Anthology. The ACL-Fig-pilot dataset contains 1,671 manually labeled scientific figures belonging to 19 categories.

1 PAPER • NO BENCHMARKS YET

MapReader Data

MapReader Data (in GeoHumanities workshop, SIGSPATIAL 2022)

MapReader in GeoHumanities workshop (SIGSPATIAL 2022): Gold standards and outputs

1 PAPER • NO BENCHMARKS YET

MuMiN-large