Seven different types of dry beans were used in this research, taking into account the features such as form, shape, type, and structure by the market situation. A computer vision system was developed to distinguish seven different registered varieties of dry beans with similar features in order to obtain uniform seed classification. For the classification model, images of 13,611 grains of 7 different registered dry beans were taken with a high-resolution camera. Bean images obtained by computer vision system were subjected to segmentation and feature extraction stages, and a total of 16 features; 12 dimensions and 4 shape forms, were obtained from the grains.
1 PAPER • NO BENCHMARKS YET
A SAR version of the EuroSAT dataset. The images were collected from Sentinel-1 GRD products (two bands VV and VH) based on the geocoordinates of the EuroSAT images.
1 PAPER • 1 BENCHMARK
The FathomNet2023 competition dataset is a subset of the broader FathomNet marine image repository. The training and test images for the competition were all collected in the Monterey Bay Area between the surface and 1300 meters depth by the Monterey Bay Aquarium Research Institute. The images contain bounding box annotations of 290 categories of bottom dwelling animals. The training and validation data are split across an 800 meter depth threshold: all training data is collected from 0-800 meters, evaluation data comes from the whole 0-1300 meter range. Since an organisms' habitat range is partially a function of depth, the species distributions in the two regions are overlapping but not identical. Test images are drawn from the same region but may come from above or below the depth horizon. The competition goal is to label the animals present in a given image (i.e. multi-label classification) and determine whether the image is out-of-sample.
In this work, we propose a novel remote sensing dataset, FireRisk, consisting of 7 fire risk classes with a total of 91 872 labelled images for fire risk assessment. This remote sensing dataset is labelled with the fire risk classes supplied by the Wildfire Hazard Potential (WHP) raster dataset, and remote sensing images are collected using the National Agriculture Imagery Program (NAIP), a high-resolution remote sensing imagery program. On FireRisk, we present benchmark performance for supervised and self-supervised representations, with Masked Autoencoders (MAE) pre-trained on ImageNet1k achieving the highest classification accuracy, 65.29%.
This collection compiles anonymous radiographs, which have been arbitrarly selected from routine at the Department of Diagnostic Radiology, Aachen University of Technology (RWTH), Aachen, Germany. The imagery represents different ages, genders, view positions and pathologies. Therefore, image quality varies significantly. All images were downscaled to fit into a 512 x 512 bounding box maintaining the original aspect ratio. All images were classified according to the IRMA code. Based on this code, 193 categories were defined. For 12,677 images, these categories are provided. The remaining 1,733 images without code are used as test data for the ImageCLEFmed 2009 competition.
ISBNet is a dataset of images of recyclables. It is hand collected by our group at the International School of Beijing. The trash in these images was gathered from trash bins around the school. ISBNet totals 889 images distributed across 5 classes: cans (74), landfill (410), paper (182), plastic (122), and tetra pak (101). The data acquisition process involved using a piece of black poster paper as a background; this would create enough contrast for trash belonging to the paper category. These pictures were taken with an iPhone 8 and an iPhone XS. We recorded the trash bin in which the piece of trash originated from and any trash generating landmarks nearby. Please refer to the paper (ThanosNet: A Novel Trash Classification Method Using Metadata) for more about the format of the metadata.
Icon645 is a large-scale dataset of icon images that cover a wide range of objects:
This ImageNet version contains only 50 training images per class while the original testing set remains unchanged. It is one of the datasets comprising the data-efficient image classification (DEIC) benchmark. It was proposed to challenge the generalization capabilities of modern image classifiers.
There was no predefined dataset of party symbols to be usedas a benchmark. We curated a dataset from various nationaland regional websites owned by the ECI. The dataset consists of symbols (image files) of 49 National and State registered parties approved by the ECI. For each image of theoriginal party symbol, 18 different distortions and transformations were created as variations to the training data. Each image is of the dimension 180 x 180. The final labeled dataset consists of 931 images of party symbols with their corresponding party names as the labels.
Iran's Built Heritage Binary Image Classification Dataset contains approximately 10,500 CHB images gathered from four different sources:
An ancestral origin database of 14,000 images of individuals from East Asia, the Indian subcontinent, sub-Saharan Africa, and Western Europe.
It is composed of around 770k of color 256x256 RGB images extracted from the European Union Intellectual Property Office (EUIPO) open registry. Each of them is associated to multiple labels that classify the figurative and textual elements that appear in the images. These annotations have been classified by the EUIPO evaluators using the Vienna classification, a hierarchical classification of figurative marks.
MapReader in GeoHumanities workshop (SIGSPATIAL 2022): Gold standards and outputs
This is the large version of the MuMiN dataset.
This is the medium version of the MuMiN dataset.
This is the small version of the MuMiN dataset.
This dataset is recreated using offline augmentation from the original dataset. The original dataset can be found on this github repo. This dataset consists of about 87K rgb images of healthy and diseased crop leaves which is categorized into 38 different classes. The total dataset is divided into 80/20 ratio of training and validation set preserving the directory structure. A new directory containing 33 test images is created later for prediction purpose.
The AASL-Clear dataset is a collection of RGB images featuring Arabic alphabet sign Language gestures with backgrounds removed. Each image in this dataset showcases clear, isolated hand gestures, allowing for precise recognition and analysis of Arabic sign language alphabets. With transparent backgrounds, this dataset provides a clean and focused resource for training deep learning models in the domain of Arabic sign language recognition and classification.
Number of images: 1,657 images during or after the fire
A high-resolution multi-sensor remote sensing scene classification dataset, appropriate for training and evaluating image classification models in the remote sensing domain.
The Prima head pose dataset consists of 2790 images of 15 persons recorded twice. Pitch values lie in the interval [−60∘,60∘], and yaw values lie in the interval [−90∘,90∘] with a 15∘ step. Thus, there are 93 poses available for each person. All the recordings were achieved with the same background. One interesting feature of this dataset is the pose space is uniformly sampled. The dataset is annotated such that a face bounding box (manually annotated) and the corresponding yaw and pitch angle values are provided for each sample.
Photozilla is a large-scale dataset which includes over 990k images belonging to 10 different photographic styles. The dataset can be used to train classification models to automatically classify the images into the relevant style.
RGB Arabic Alphabet Sign Language (AASL) dataset
This paper introduces the RGB Arabic Alphabet Sign Language (AASL) dataset. AASL comprises 7,856 raw and fully labeled RGB images of the Arabic sign language alphabets, which to our best knowledge is the first publicly available RGB dataset. The dataset is aimed to help those interested in developing real-life Arabic sign language classification models. AASL was collected from more than 200 participants and with different settings such as lighting, background, image orientation, image size, and image resolution. Experts in the field supervised, validated and filtered the collected images to ensure a high-quality dataset. AASL is made available to the public on Kaggle.
Our proposed Synthetic-to-Real benchmark for more practical visual DA (termed S2RDA) includes two challenging transfer tasks of S2RDA-49 and S2RDA-MS-39. In each task, source/synthetic domain samples are synthesized by rendering 3D models from ShapeNet. The used 3D models are in the same label space as the target/real domain and each class has 12K rendered RGB images. The real domain of S2RDA-49 comprises 60,535 images of 49 classes, collected from ImageNet validation set, ObjectNet, VisDA-2017 validation set, and the web. For S2RDA-MS-39, the real domain collects 41,735 natural images exclusive for 39 classes from MetaShift, which contain complex and distinct contexts, e.g., object presence (co-occurrence of different objects), general contexts (indoor or outdoor), and object attributes (color or shape), leading to a much harder task. Compared to VisDA-2017, our S2RDA contains more categories, more realistically synthesized source domain data coming for free, and more complicated targ
While convolutions are known to be invariant to (discrete) translations, scaling continues to be a challenge and most image recognition networks are not invariant to them. To explore these effects, we have created the Scaled and Translated Image Recognition (STIR) dataset. This dataset contains objects of size $s \in [17, 64]$, each randomly placed in a $64 \times 64$ pixel image.
The social vision and language dataset is a large-scale multimodal dataset designed for research into social contextual learning.
SolarDK is a dataset for the detection and localization of solar. It comprises images from GeoDanmark with a variable Ground Sample Distance (GSD) between 10 cm and 15 cm, all sampled between March 1st and May 1st during 2021, containing 23,417 hand labelled images for classification and 880 segmentation masks, in addition to a set of about 100,000+ images for classification covering most variations of Danish urban and rural landscapes.
A public open dataset of synthetic chest X-ray images of COVID-19.
TEM image dataset containing four nanowire morphologies of bio-derived protein nanowires and synthetic peptide nanowires.
Tsinghua Dogs is a fine-grained classification dataset for dogs, over 65% of whose images are collected from people's real life. Each dog breed in the dataset contains at least 200 images and a maximum of 7,449 images, basically in proportion to their frequency of occurrence in China, so it significantly increases the diversity for each breed over existing dataset. Furthermore, Tsinghua Dogs annotated bounding boxes of the dog’s whole body and head in each image, which can be used for supervising the training of learning algorithms as well as testing them.
The YFCC100M Fine-Grained Geolocation dataset is a subset of 100 a set of 36,146 YFCC100M images that had Flickr tags that could be identified as corresponding to one of the labels in the iNaturalist 2017 dataset. The 36,146 images that were selected so have the following characteristics: the image must have geolocation available, the image must have at most one iNaturalist label, at most ten examples were retained for each label.
The iNaturalist Fine-Grained Geolocation dataset is an extension of the iNaturalist dataset with complementary geolocation information.
Easily generate simple continual learning benchmarks. Inspired by dSprites.
topex-printer is a dataset containing 102 machine parts of a label printing machine. It includes these parts for two domains, real photos and CAD rendered models.
ADFI Dataset is an image dataset for anomaly detection methods with a focus on industrial inspection. Each category sub dataset comprises a training set of images and a test set of images with various kinds of defects as well as images without defects.
0 PAPER • NO BENCHMARKS YET
Dataset contains images with apples infected by scab. The images are grouped in two folders: "Healthy" and "Scab". The collection of digital images were carried out in different locations of Latvia. Digital images with characteristic scab symptoms on fruits were collected by the Institute of Horticulture (LatHort) under project "lzp-2019/1-0094 Application of deep learning and datamining for the study of plant-pathogen interaction: the case of apple and pear scab" with a goal to create mobile application for apple scab detection using convolution neural networks. Devices: smartphone cameras (12 MP, 13 MP, 48 MP) and a digital compact camera (10 MP). The collection of images was carried out in field conditions, in orchards. The images were taken at three different stages of the day - in the morning (9:00-10:00), around noon (12:00-14:00), as well as in the evening (16:00-17:00) to provide a variety of natural light conditions. The images were also taken on both sunny days and overcast d
Dataset contains images with apple leaves infected by scab. The images are grouped in two folders: "Healthy" and "Scab". The collection of digital images were carried out in different locations of Latvia. Digital images with characteristic scab symptoms on leaves were collected by the Institute of Horticulture (LatHort) under project "lzp-2019/1-0094 Application of deep learning and datamining for the study of plant-pathogen interaction: the case of apple and pear scab" with a goal to create mobile application for apple scab detection using convolution neural networks. Devices: smartphone cameras (12 MP, 13 MP, 48 MP) and a digital compact camera (10 MP). The collection of images was carried out in field conditions, in orchards. The images were taken at three different stages of the day - in the morning (9:00-10:00), around noon (12:00-14:00), as well as in the evening (16:00-17:00) to provide a variety of natural light conditions. The images were also taken on both sunny days and over
The dataset contains concrete images having cracks. The data is collected from various METU Campus Buildings. The dataset is divided into two as negative and positive crack images for image classification. Each class has 20000images with a total of 40000 images with 227 x 227 pixels with RGB channels. The dataset is generated from 458 high-resolution images (4032x3024 pixel) with the method proposed by Zhang et al (2016). High-resolution images have variance in terms of surface finish and illumination conditions. No data augmentation in terms of random rotation or flipping is applied.
Ancient books script identification of Chinese ethnic minorities with deep convolutional neural networks via multi-branch and spatial pyramid pooling
We present a cellular microscopic image dataset for investigating channel-adaptive models. We collected and pre-processed images from three publicly available sources: 1) the WTC-11 hiPSC dataset from the Allen Institute (Viana et al., 2023), 2) the Human Protein Atlas dataset (Thul et al., 2017), and 3) a combined Cell Painting dataset from the Broad Institute (Gustafsdottir et al., 2013; Bray et al., 2017; Way et al., 2021). These images contain 3, 4, or 5 channels with different cellular structures highlighted in each channel. The goal of this dataset is to facilitate the creation and evaluation of novel computer vision models that are invariant to channel numbers.
This dataset is the images of corn seeds considering the top and bottom view independently (two images for one corn seed: top and bottom). There are four classes of the corn seed (Broken-B, Discolored-D, Silkcut-S, and Pure-P) 17802 images are labeled by the experts at the AdTech Corp. and 26K images were unlabeled out of which 9k images were labeled using the Active Learning (BatchBALD)
This dataset contains 16,000 images of four shapes; square, star, circle, and triangle. Each image is 200x200 pixels.
A dataset of all Moroccan money
Mudestreda Multimodal Device State Recognition Dataset obtained from real industrial milling device with Time Series and Image Data for Classification, Regression, Anomaly Detection, Remaining Useful Life (RUL) estimation, Signal Drift measurement, Zero Shot Flank Took Wear, and Feature Engineering purposes.
The study showed that the apple scab can be detected in the high-resolution RGB images in an early stage of its development. If two datasets, the early and advanced stages, are grouped together, the scab in the early stage is not visible after image resizing for CNN inputs 200-500px.