🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

240 dataset results for Image Classification

PMData

The PMData dataset aims to combine the traditional lifelogging with sports activity logging.

4 PAPERS • NO BENCHMARKS YET

RF100 (Roboflow 100)

The evaluation of object detection models is usually performed by optimizing a single metric, e.g. mAP, on a fixed set of datasets, e.g. Microsoft COCO and Pascal VOC. Due to image retrieval and annotation costs, these datasets consist largely of images found on the web and do not represent many real-life domains that are being modelled in practice, e.g. satellite, microscopic and gaming, making it difficult to assert the degree of generalization learned by the model.

4 PAPERS • 1 BENCHMARK

SIPaKMeD

SIPaKMeD (SIPaKMeD Pap Smear dataset)

a high-level explanation of the dataset characteristics explain motivations and summary of its content potential use cases of the dataset

4 PAPERS • 1 BENCHMARK

Sewer-ML

Sewer-ML is a sewer defect dataset. It contains 1.3 million images, from 75,618 videos collected from three Danish water utility companies over nine years. All videos have been annotated by licensed sewer inspectors following the Danish sewer inspection standard, Fotomanualen. This leads to consistent and reliable annotations, and a total of 17 annotated defect classes.

4 PAPERS • NO BENCHMARKS YET

Sports10

Games dataset containing 100,000 Gameplay Images of 175 Video Games across 10 Sports Genres - AMERICAN FOOTBALL, BASKETBALL, BIKE RACING, CAR RACING, FIGHTING, HOCKEY, SOCCER, TABLE TENNIS, TENNIS.

4 PAPERS • 2 BENCHMARKS

Atlas

Atlas is a dataset for e-commerce clothing product categorization. The Atlas dataset consists of a high-quality product taxonomy dataset focusing on clothing products which contain 186,150 images under clothing category with 3 levels and 52 leaf nodes in the taxonomy.

3 PAPERS • NO BENCHMARKS YET

BCNB (Early Breast Cancer Core-Needle Biopsy WSI)

Breast cancer (BC) has become the greatest threat to women’s health worldwide. Clinically, identification of axillary lymph node (ALN) metastasis and other tumor clinical characteristics such as ER, PR, and so on, are important for evaluating the prognosis and guiding the treatment for BC patients.

3 PAPERS • NO BENCHMARKS YET

DEIC Benchmark (Data-Efficient Image Classification Benchmark)

DEIC is a benchmark for measuring the data efficiency of models in the context of image classification. It is composed of 6 datasets that contain a small number of training samples per class (i.e., 30 < x < 80). It covers multiple image domains (i.e., natural images, fine-grained recognition, medical images, remote sensing, handwriting recognition) and data types (i.e., RGB, grayscale, multi-spectral).

3 PAPERS • 5 BENCHMARKS

Dirty-MNIST

DirtyMNIST is a concatenation of MNIST + AmbiguousMNIST, with 60k samples each in the training set. AmbiguousMNIST contains additional ambiguous digits with varying ambiguity. The AmbiguousMNIST test set contains 60k ambiguous samples as well.

3 PAPERS • NO BENCHMARKS YET

HErlev

HErlev (HErlev Pap Smear Dataset)

3 PAPERS • 1 BENCHMARK

Image and Video Advertisements

The Image and Video Advertisements collection consists of an image dataset of 64,832 image ads, and a video dataset of 3,477 ads. The data contains rich annotations encompassing the topic and sentiment of the ads, questions and answers describing what actions the viewer is prompted to take and the reasoning that the ad presents to persuade the viewer ("What should I do according to this ad, and why should I do it? "), and symbolic references ads make (e.g. a dove symbolizes peace).

3 PAPERS • NO BENCHMARKS YET

KTH-TIPS2

The KTH-TIPS (Textures under varying Illumination, Pose and Scale) image database was created to extend the CUReT database in two directions, by providing variations in scale as well as pose and illumination, and by imaging other samples of a subset of its materials in different settings.

3 PAPERS • 1 BENCHMARK

LKS (Liver Kidney Stomach)

LKS is a dataset of 684 Liver-Kidney-Stomach immunofluorescence whole slide images (WSIs) used in the investigation of autoimmune liver disease.

3 PAPERS • NO BENCHMARKS YET

LSA16 (Lengua de Señas Argentina - 16 Handshapes classes)

This database contains images of 16 handshapes of the Argentinian Sign Language (LSA), each performed 5 times by 10 different subjects, for a total of 800 images. The subjects wore color hand gloves and dark clothes.

3 PAPERS • 1 BENCHMARK

MIMIC-CXR-LT (long-tailed version of MIMIC-CXR)

MIMIC-CXR-LT. We construct a single-label, long-tailed version of MIMIC-CXR in a similar manner. MIMIC-CXR is a multi-label classification dataset with over 200,000 chest X-rays labeled with 13 pathologies and a “No Findings” class. The resulting MIMIC-CXR-LT dataset contains 19 classes, of which 10 are head classes, 6 are medium classes, and 3 are tail classes. MIMIC-CXR-LT contains 111,792 images labeled with one of 18 diseases, with 87,493 training images and 23,550 test set images. The validation and balanced test sets contain 15 and 30 images per class, respectively.

3 PAPERS • 1 BENCHMARK

NIH-CXR-LT (Long-tailed (LT) NIH ChestXRay14)

NIH-CXR-LT. NIH ChestXRay14 contains over 100,000 chest X-rays labeled with 14 pathologies, plus a “No Findings” class. We construct a single-label, long-tailed version of the NIH ChestXRay14 dataset by introducing five new disease findings described above. The resulting NIH-CXR-LT dataset has 20 classes, including 7 head classes, 10 medium classes, and 3 tail classes. NIH-CXR-LT contains 88,637 images labeled with one of 19 thorax diseases, with 68,058 training and 20,279 test images. The validation and balanced test sets contain 15 and 30 images per class, respectively.

3 PAPERS • 1 BENCHMARK

OFDIW (OnFocus Detection In the Wild)

OnFocus Detection In the Wild (OFDIW) is an onfocus detection dataset. It consists of 20,623 images in unconstrained capture conditions (thus called "in the wild'') and contains individuals with diverse emotions, ages, facial characteristics, and rich interactions with surrounding objects and background scenes. The images are collected from the LFW dataset and the Oxford-IIIT Pet dataset. Onfocus detection aims at identifying whether the focus of the individual captured by a camera is on the camera or not.

3 PAPERS • NO BENCHMARKS YET

SI-Score

SI-SCORE is a synthetic dataset for the analysis of robustness to object location, rotation and size. It consists of images that vary only for factors like object size and object location.

3 PAPERS • NO BENCHMARKS YET

SIDD-Image (Segmented Intrusion Detection Dataset)

This is the first image-based network intrusion detection dataset. This large-scale dataset included network traffic protocol communication-based images from 15 different observation locations of different countries in Asia. This dataset is used to identify two different types of anomalies from benign network traffic. Each image with a size of 48 × 48 contains multi-protocol communications within 128 seconds. The SIDD dataset can be to applied to a broad range of tasks such as machine learning-based network intrusion detection, non-iid federated learning, and so forth.

3 PAPERS • 1 BENCHMARK

Stream-51

A new dataset for streaming classification consisting of temporally correlated images from 51 distinct object categories and additional evaluation classes outside of the training distribution to test novelty recognition.

3 PAPERS • NO BENCHMARKS YET

ASIRRA

ASIRRA ((Animal Species Image Recognition for Restricting Access)

Web services are often protected with a challenge that's supposed to be easy for people to solve, but difficult for computers. Such a challenge is often called a CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) or HIP (Human Interactive Proof). HIPs are used for many purposes, such as to reduce email and blog spam and prevent brute-force attacks on web site passwords.

2 PAPERS • NO BENCHMARKS YET

AdvNet

AdvNet is a dataset of traffic signs images. Specifically, it includes adversarial traffic sign images (i.e., pictures of traffic signs with stickers on their surface) that can fool state-of-the-art neural network-based perception systems and clean traffic sign images without any stickers on them.

2 PAPERS • NO BENCHMARKS YET

Animals-10

It contains about 28K medium quality animal images belonging to 10 categories: dog, cat, horse, spyder, butterfly, chicken, sheep, cow, squirrel, and elephant.

2 PAPERS • 1 BENCHMARK

CARBEN (Composite Adversarial Robustness Benchmark)

Prior literature on adversarial attack methods has mainly focused on attacking with and defending against a single threat model, e.g., perturbations bounded in Lp ball. However, multiple threat models can be combined into composite perturbations. One such approach, composite adversarial attack (CAA), not only expands the perturbable space of the image, but also may be overlooked by current modes of robustness evaluation. To this end, we proposed CARBEN, a benchmark of composite adversarial robustness that accurately reflects the composite robustness of the considered models.

2 PAPERS • NO BENCHMARKS YET

Cross-View Time Dataset

The appearance of the world varies dramatically not only from place to place but also from hour to hour and month to month. Every day billions of images capture this complex relationship, many of which are associated with precise time and location metadata. We propose to use these images to construct a global-scale, dynamic map of visual appearance attributes. Such a map enables fine-grained understanding of the expected appearance at any geographic location and time. Our approach integrates dense overhead imagery with location and time metadata into a general framework capable of mapping a wide variety of visual attributes. A key feature of our approach is that it requires no manual data annotation. We demonstrate how this approach can support various applications, including image-driven mapping, image geolocalization, and metadata verification.

2 PAPERS • 1 BENCHMARK

Deep PCB (Deep Printed Circuit Board)

DeepPCB

2 PAPERS • 1 BENCHMARK

Endotect Polyp Segmentation Challenge Dataset

A challenge that consists of three tasks, each targeting a different requirement for in-clinic use. The first task involves classifying images from the GI tract into 23 distinct classes. The second task focuses on efficiant classification measured by the amount of time spent processing each image. The last task relates to automatcially segmenting polyps.

2 PAPERS • 1 BENCHMARK

FMD (materials)

FMD (materials) (Flickr Material Dataset)

Sharan, Lavanya, Ruth Rosenholtz, and Edward Adelson. "Material perception: What can you see in a brief glance?." Journal of Vision 9.8 (2009): 784-784. http://people.csail.mit.edu/celiu/CVPR2010/FMD/FMD.zip

2 PAPERS • 1 BENCHMARK

HiAML

HiAML Computational Graph (CG) family introduced in "GENNAPE: Towards Generalized Neural Architecture Performance Estimators", accepted to AAAI-23. Contains 4.6k CIFAR-10 networks with an accuracy range of [91.11%, 93.44%].

2 PAPERS • NO BENCHMARKS YET

ImageNet-Hard

ImageNet-Hard is a new benchmark that comprises 10,980 images collected from various existing ImageNet-scale benchmarks (ImageNet, ImageNet-V2, ImageNet-Sketch, ImageNet-C, ImageNet-R, ImageNet-ReaL, ImageNet-A, and ObjectNet). This dataset poses a significant challenge to state-of-the-art vision models as merely zooming in often fails to improve their ability to classify images correctly. As a result, even the most advanced models, such as CLIP-ViT-L/14@336px, struggle to perform well on this dataset, achieving a mere 2.02% accuracy.

2 PAPERS • 1 BENCHMARK

Inception

Inception Computational Graph (CG) family introduced in "GENNAPE: Towards Generalized Neural Architecture Performance Estimators", accepted to AAAI-23. Contains 580 CIFAR-10 networks with an accuracy range of [89.08%, 94.03%].

2 PAPERS • NO BENCHMARKS YET

Kvasir-Capsule

Kvasir-Capsule dataset is the largest publicly released VCE dataset. In total, the dataset contains 47,238 labeled images and 117 videos, where it captures anatomical landmarks and pathological and normal findings. The results is more than 4,741,621 images and video frames altogether.

2 PAPERS • NO BENCHMARKS YET

MAMe (Museum Art Medium dataset)

The MAMe dataset contains images of high-resolution and variable shape of artworks from 3 different museums:

2 PAPERS • 1 BENCHMARK

PolSF

Collects five open polarimetric SAR images, which are images of the San Francisco area. These five images come from different satellites at different times, which has great scientific research value.

2 PAPERS • NO BENCHMARKS YET

Two-Path

Two-Path Computational Graph (CG) family introduced in "GENNAPE: Towards Generalized Neural Architecture Performance Estimators", accepted to AAAI-23. Contains 6.9k CIFAR-10 networks with an accuracy range of [85.53%, 92.34%].

2 PAPERS • NO BENCHMARKS YET

Vistas-NP

The Vistas-NP dataset is an out-of-distribution detection dataset based on the Mapillary Vistas dataset. The original Vistas dataset consists of 18,000 training images and 2,000 validation images with 66 classes. In Vistas-NP the human classes are used as outliers due to their dispersion across scenes and visual diversity from other objects. The dataset is created by excluding all images with class person and the three rider classes to the test subset. Consequently, the dataset has 8,003 train images and 830 validation images. The test set contains 11,167.

2 PAPERS • NO BENCHMARKS YET

WHU-Hi (Wuhan UAV-borne hyperspectral image)

WHU-Hi dataset (Wuhan UAV-borne hyperspectral image) is collected and shared by the RSIDEA research group of Wuhan University, and it could serve as a benchmark dataset for precise crop classification and hyperspectral image classification studies. The WHU-Hi dataset contains three individual UAV-borne hyperspectral datasets: WHU-Hi-LongKou, WHU-Hi-HanChuan, and WHU-Hi-HongHu. All the datasets were acquired in farming areas with various crop types in Hubei province, China, via a Headwall Nano-Hyperspec sensor mounted on a UAV platform. Compared with spaceborne and airborne hyperspectral platforms, unmanned aerial vehicle (UAV)-borne hyperspectral systems can acquire hyperspectral imagery with a high spatial resolution (which we refer to here as H2 imagery). The research was published in Remote Sensing of Environment.

2 PAPERS • NO BENCHMARKS YET

ACL-Fig

ACL-Fig is a large-scale automatically annotated corpus consisting of 112,052 scientific figures extracted from 56K research papers in the ACL Anthology. The ACL-Fig-pilot dataset contains 1,671 manually labeled scientific figures belonging to 19 categories.

1 PAPER • NO BENCHMARKS YET

ARC-100

The ARC-100 dataset was collected as part of a prototype retail checkout system titled ARC (Automatic Retail Checkout). It consists of 31,000 $640\times480$ RGB images of 100 commonly found retail items in Lahore, Pakistan. Each retail item has 310 images captured at various logical orientations (on a black, matte finish conveyor belt) by a Logitech C310 webcam, under a wooden hood frame illuminated by LED strips (luminance set to approximately $70lx$). In the proposed setup, images were pre-processed and standardized before feeding into a Convolutional Neural Network for identification.

1 PAPER • NO BENCHMARKS YET

AjwaOrMedjool

AjwaOrMedjool (AjwaOrMedjool: a binary balanced dataset to teach machine learning‏)

The dataset contains three subsets:

1 PAPER • NO BENCHMARKS YET

ArtDL

ArtDL is a novel painting data set for iconography classification composed of images collected from online sources. Most of the paintings are from the Renaissance period and depict scenes or characters of Christian art. The data set is annotated with classes representing specific characters belonging to the Iconclass classification system.

1 PAPER • 1 BENCHMARK

BankNote-Net

Millions of people around the world have low or no vision. Assistive software applications have been developed for a variety of day-to-day tasks, including currency recognition. To aid with this task, we present BankNote-Net, an open dataset for assistive currency recognition. The dataset consists of a total of 24,816 embeddings of banknote images captured in a variety of assistive scenarios, spanning 17 currencies and 112 denominations. These compliant embeddings were learned using supervised contrastive learning and a MobileNetV2 architecture, and they can be used to train and test specialized downstream models for any currency, including those not covered by our dataset or for which only a few real images per denomination are available (few-shot learning). We deploy a variation of this model for public use in the last version of the Seeing AI app developed by Microsoft, which has over a 100 thousand monthly active users.

1 PAPER • NO BENCHMARKS YET

CNFOOD-241

Contains a dataset of 241 Chinese dishes with 191,811 images. There are 170843 images in the training set and 20943 images in the validation set. All images are resized to 600x600. As some of the images in the dataset are from ChineseFoodNet, they are not supported for commercial use.

1 PAPER • NO BENCHMARKS YET

CNFOOD-241-Chen

CNFOOD-241 Contains a dataset of 241 Chinese dishes with 191,811 images. There are 170843 images in the training set and 20943 images in the validation set. All images are resized to 600x600. As some of the images in the dataset are from ChineseFoodNet, they are not supported for commercial use. CNFOOD-241-Chen is the CNFOOD-241 dataset spilt with the list introduced in the paper "Res-VMamba: Fine-Grained Food Category Visual Classification Using Selective State Space Models with Deep Residual Learning," which has random split as train, val, test three parts.

1 PAPER • 1 BENCHMARK

Cervix93 Cytology Dataset

The dataset has 93 image stacks and their corresponding Extended Depth of Field (EDF) image acquired from cases with grades Nagative, LSIL or HSIL (The Bethesda System): - Negative: 16 - LSIL: 46 - HSIL: 31 The ground truth includes the grade labels for each frame and manually marked points inside cervical cells in each frame. There are in total 2705 manually marked points inside all frames: - Negative: 238 - LSIL: 1536 - HSIL: 931

1 PAPER • NO BENCHMARKS YET

Cross-View Time Dataset (Cross-Camera Split)

The standard evaluation protocol of Cross-View Time dataset allows for certain cameras to be shared between training and testing sets. This protocol can emulate scenarios in which we need to verify the authenticity of images from a particular set of devices and locations. Considering the ubiquity of surveillance systems (CCTV) nowadays, this is a common scenario, especially for big cities and high visibility events (e.g., protests, musical concerts, terrorist attempts, sports events). In such cases, we can leverage the availability of historical photographs of that device and collect additional images from previous days, months, and years. This would allow the model to better capture the particularities of how time influences the appearance of that specific place, probably leading to a better verification accuracy. However, there might be cases in which data is originated from heterogeneous sources, such as social media. In this sense, it is essential that models are optimized on camer

1 PAPER • 1 BENCHMARK

DF20 - Mini (Danish Fungi 2020 - Mini)

Danish Fungi 2020 (DF20) is a novel fine-grained dataset and benchmark. The dataset, constructed from observations submitted to the Danish Fungal Atlas, is unique in its taxonomy-accurate class labels, small number of errors, highly unbalanced long-tailed class distribution, rich observation metadata, and well-defined class hierarchy. DF20 has zero overlap with ImageNet, allowing unbiased comparison of models fine-tuned from publicly available ImageNet checkpoints.

1 PAPER • 1 BENCHMARK

DIB-10K

DIB-10K (DongNiao International Birds 10000)

Is a challenging image dataset which has more than 10 thousand different types of birds. It was created to enable the study of machine learning and also ornithology research.

1 PAPER • NO BENCHMARKS YET

Datasets

240 dataset results for Image Classification