This collection compiles anonymous radiographs, which have been arbitrarly selected from routine at the Department of Diagnostic Radiology, Aachen University of Technology (RWTH), Aachen, Germany. The imagery represents different ages, genders, view positions and pathologies. Therefore, image quality varies significantly. All images were downscaled to fit into a 512 x 512 bounding box maintaining the original aspect ratio. All images were classified according to the IRMA code. Based on this code, 193 categories were defined. For 12,677 images, these categories are provided. The remaining 1,733 images without code are used as test data for the ImageCLEFmed 2009 competition.
1 PAPER • NO BENCHMARKS YET
ISBNet is a dataset of images of recyclables. It is hand collected by our group at the International School of Beijing. The trash in these images was gathered from trash bins around the school. ISBNet totals 889 images distributed across 5 classes: cans (74), landfill (410), paper (182), plastic (122), and tetra pak (101). The data acquisition process involved using a piece of black poster paper as a background; this would create enough contrast for trash belonging to the paper category. These pictures were taken with an iPhone 8 and an iPhone XS. We recorded the trash bin in which the piece of trash originated from and any trash generating landmarks nearby. Please refer to the paper (ThanosNet: A Novel Trash Classification Method Using Metadata) for more about the format of the metadata.
1 PAPER • 1 BENCHMARK
Icon645 is a large-scale dataset of icon images that cover a wide range of objects:
This ImageNet version contains only 50 training images per class while the original testing set remains unchanged. It is one of the datasets comprising the data-efficient image classification (DEIC) benchmark. It was proposed to challenge the generalization capabilities of modern image classifiers.
There was no predefined dataset of party symbols to be usedas a benchmark. We curated a dataset from various nationaland regional websites owned by the ECI. The dataset consists of symbols (image files) of 49 National and State registered parties approved by the ECI. For each image of theoriginal party symbol, 18 different distortions and transformations were created as variations to the training data. Each image is of the dimension 180 x 180. The final labeled dataset consists of 931 images of party symbols with their corresponding party names as the labels.
Iran's Built Heritage Binary Image Classification Dataset contains approximately 10,500 CHB images gathered from four different sources:
It is composed of around 770k of color 256x256 RGB images extracted from the European Union Intellectual Property Office (EUIPO) open registry. Each of them is associated to multiple labels that classify the figurative and textual elements that appear in the images. These annotations have been classified by the EUIPO evaluators using the Vienna classification, a hierarchical classification of figurative marks.
MapReader in GeoHumanities workshop (SIGSPATIAL 2022): Gold standards and outputs
This is the large version of the MuMiN dataset.
This is the medium version of the MuMiN dataset.
This is the small version of the MuMiN dataset.
This dataset is recreated using offline augmentation from the original dataset. The original dataset can be found on this github repo. This dataset consists of about 87K rgb images of healthy and diseased crop leaves which is categorized into 38 different classes. The total dataset is divided into 80/20 ratio of training and validation set preserving the directory structure. A new directory containing 33 test images is created later for prediction purpose.
The AASL-Clear dataset is a collection of RGB images featuring Arabic alphabet sign Language gestures with backgrounds removed. Each image in this dataset showcases clear, isolated hand gestures, allowing for precise recognition and analysis of Arabic sign language alphabets. With transparent backgrounds, this dataset provides a clean and focused resource for training deep learning models in the domain of Arabic sign language recognition and classification.
Number of images: 1,657 images during or after the fire
A high-resolution multi-sensor remote sensing scene classification dataset, appropriate for training and evaluating image classification models in the remote sensing domain.
The Prima head pose dataset consists of 2790 images of 15 persons recorded twice. Pitch values lie in the interval [−60∘,60∘], and yaw values lie in the interval [−90∘,90∘] with a 15∘ step. Thus, there are 93 poses available for each person. All the recordings were achieved with the same background. One interesting feature of this dataset is the pose space is uniformly sampled. The dataset is annotated such that a face bounding box (manually annotated) and the corresponding yaw and pitch angle values are provided for each sample.
Photozilla is a large-scale dataset which includes over 990k images belonging to 10 different photographic styles. The dataset can be used to train classification models to automatically classify the images into the relevant style.
RGB Arabic Alphabet Sign Language (AASL) dataset
This paper introduces the RGB Arabic Alphabet Sign Language (AASL) dataset. AASL comprises 7,856 raw and fully labeled RGB images of the Arabic sign language alphabets, which to our best knowledge is the first publicly available RGB dataset. The dataset is aimed to help those interested in developing real-life Arabic sign language classification models. AASL was collected from more than 200 participants and with different settings such as lighting, background, image orientation, image size, and image resolution. Experts in the field supervised, validated and filtered the collected images to ensure a high-quality dataset. AASL is made available to the public on Kaggle.
Our proposed Synthetic-to-Real benchmark for more practical visual DA (termed S2RDA) includes two challenging transfer tasks of S2RDA-49 and S2RDA-MS-39. In each task, source/synthetic domain samples are synthesized by rendering 3D models from ShapeNet. The used 3D models are in the same label space as the target/real domain and each class has 12K rendered RGB images. The real domain of S2RDA-49 comprises 60,535 images of 49 classes, collected from ImageNet validation set, ObjectNet, VisDA-2017 validation set, and the web. For S2RDA-MS-39, the real domain collects 41,735 natural images exclusive for 39 classes from MetaShift, which contain complex and distinct contexts, e.g., object presence (co-occurrence of different objects), general contexts (indoor or outdoor), and object attributes (color or shape), leading to a much harder task. Compared to VisDA-2017, our S2RDA contains more categories, more realistically synthesized source domain data coming for free, and more complicated targ
While convolutions are known to be invariant to (discrete) translations, scaling continues to be a challenge and most image recognition networks are not invariant to them. To explore these effects, we have created the Scaled and Translated Image Recognition (STIR) dataset. This dataset contains objects of size $s \in [17, 64]$, each randomly placed in a $64 \times 64$ pixel image.
The social vision and language dataset is a large-scale multimodal dataset designed for research into social contextual learning.
SolarDK is a dataset for the detection and localization of solar. It comprises images from GeoDanmark with a variable Ground Sample Distance (GSD) between 10 cm and 15 cm, all sampled between March 1st and May 1st during 2021, containing 23,417 hand labelled images for classification and 880 segmentation masks, in addition to a set of about 100,000+ images for classification covering most variations of Danish urban and rural landscapes.
A public open dataset of synthetic chest X-ray images of COVID-19.
Tsinghua Dogs is a fine-grained classification dataset for dogs, over 65% of whose images are collected from people's real life. Each dog breed in the dataset contains at least 200 images and a maximum of 7,449 images, basically in proportion to their frequency of occurrence in China, so it significantly increases the diversity for each breed over existing dataset. Furthermore, Tsinghua Dogs annotated bounding boxes of the dog’s whole body and head in each image, which can be used for supervising the training of learning algorithms as well as testing them.
The YFCC100M Fine-Grained Geolocation dataset is a subset of 100 a set of 36,146 YFCC100M images that had Flickr tags that could be identified as corresponding to one of the labels in the iNaturalist 2017 dataset. The 36,146 images that were selected so have the following characteristics: the image must have geolocation available, the image must have at most one iNaturalist label, at most ten examples were retained for each label.
The iNaturalist Fine-Grained Geolocation dataset is an extension of the iNaturalist dataset with complementary geolocation information.
Easily generate simple continual learning benchmarks. Inspired by dSprites.
topex-printer is a dataset containing 102 machine parts of a label printing machine. It includes these parts for two domains, real photos and CAD rendered models.
ADFI Dataset is an image dataset for anomaly detection methods with a focus on industrial inspection. Each category sub dataset comprises a training set of images and a test set of images with various kinds of defects as well as images without defects.
0 PAPER • NO BENCHMARKS YET
Dataset contains images with apples infected by scab. The images are grouped in two folders: "Healthy" and "Scab". The collection of digital images were carried out in different locations of Latvia. Digital images with characteristic scab symptoms on fruits were collected by the Institute of Horticulture (LatHort) under project "lzp-2019/1-0094 Application of deep learning and datamining for the study of plant-pathogen interaction: the case of apple and pear scab" with a goal to create mobile application for apple scab detection using convolution neural networks. Devices: smartphone cameras (12 MP, 13 MP, 48 MP) and a digital compact camera (10 MP). The collection of images was carried out in field conditions, in orchards. The images were taken at three different stages of the day - in the morning (9:00-10:00), around noon (12:00-14:00), as well as in the evening (16:00-17:00) to provide a variety of natural light conditions. The images were also taken on both sunny days and overcast d
Dataset contains images with apple leaves infected by scab. The images are grouped in two folders: "Healthy" and "Scab". The collection of digital images were carried out in different locations of Latvia. Digital images with characteristic scab symptoms on leaves were collected by the Institute of Horticulture (LatHort) under project "lzp-2019/1-0094 Application of deep learning and datamining for the study of plant-pathogen interaction: the case of apple and pear scab" with a goal to create mobile application for apple scab detection using convolution neural networks. Devices: smartphone cameras (12 MP, 13 MP, 48 MP) and a digital compact camera (10 MP). The collection of images was carried out in field conditions, in orchards. The images were taken at three different stages of the day - in the morning (9:00-10:00), around noon (12:00-14:00), as well as in the evening (16:00-17:00) to provide a variety of natural light conditions. The images were also taken on both sunny days and over
We present a cellular microscopic image dataset for investigating channel-adaptive models. We collected and pre-processed images from three publicly available sources: 1) the WTC-11 hiPSC dataset from the Allen Institute (Viana et al., 2023), 2) the Human Protein Atlas dataset (Thul et al., 2017), and 3) a combined Cell Painting dataset from the Broad Institute (Gustafsdottir et al., 2013; Bray et al., 2017; Way et al., 2021). These images contain 3, 4, or 5 channels with different cellular structures highlighted in each channel. The goal of this dataset is to facilitate the creation and evaluation of novel computer vision models that are invariant to channel numbers.
This dataset is the images of corn seeds considering the top and bottom view independently (two images for one corn seed: top and bottom). There are four classes of the corn seed (Broken-B, Discolored-D, Silkcut-S, and Pure-P) 17802 images are labeled by the experts at the AdTech Corp. and 26K images were unlabeled out of which 9k images were labeled using the Active Learning (BatchBALD)
Mudestreda Multimodal Device State Recognition Dataset obtained from real industrial milling device with Time Series and Image Data for Classification, Regression, Anomaly Detection, Remaining Useful Life (RUL) estimation, Signal Drift measurement, Zero Shot Flank Took Wear, and Feature Engineering purposes.