The LFW dataset contains 13,233 images of faces collected from the web. This dataset consists of the 5749 identities with 1680 people with two or more images. In the standard LFW evaluation protocol the verification accuracies are reported on 6000 face pairs.
784 PAPERS • 13 BENCHMARKS
The CASIA-WebFace dataset is used for face verification and face identification tasks. The dataset contains 494,414 face images of 10,575 real identities collected from the web.
386 PAPERS • 2 BENCHMARKS
The MS-Celeb-1M dataset is a large-scale face recognition dataset consists of 100K identities, and each identity has about 100 facial images. The original identity labels are obtained automatically from webpages.
246 PAPERS • NO BENCHMARKS YET
The Extended Yale B database contains 2414 frontal-face images with size 192×168 over 38 subjects and about 64 images per subject. The images were captured under different lighting conditions and various facial expressions.
180 PAPERS • 1 BENCHMARK
MORPH is a facial age estimation dataset, which contains 55,134 facial images of 13,617 subjects ranging from 16 to 77 years old.
169 PAPERS • 8 BENCHMARKS
The IJB-B dataset is a template-based face dataset that contains 1845 subjects with 11,754 images, 55,025 frames and 7,011 videos where a template consists of a varying number of still images and video frames from different sources. These images and videos are collected from the Internet and are totally unconstrained, with large variations in pose, illumination, image quality etc. In addition, the dataset comes with protocols for 1-to-1 template-based face verification, 1-to-N template-based open-set face identification, and 1-to-N open-set video face identification.
143 PAPERS • 5 BENCHMARKS
The Adience dataset, published in 2014, contains 26,580 photos across 2,284 subjects with a binary gender label and one label from eight different age groups, partitioned into five splits. The key principle of the data set is to capture the images as close to real world conditions as possible, including all variations in appearance, pose, lighting condition and image quality, to name a few.
115 PAPERS • 6 BENCHMARKS
The VGG Face dataset is face identity recognition dataset that consists of 2,622 identities. It contains over 2.6 million images.
88 PAPERS • NO BENCHMARKS YET
Partial REID is a specially designed partial person reidentification dataset that includes 600 images from 60 people, with 5 full-body images and 5 occluded images per person. These images were collected on a university campus by 6 cameras from different viewpoints, backgrounds and different types of occlusion. The examples of partial persons in the Partial REID dataset are shown in the Figure.
39 PAPERS • NO BENCHMARKS YET
CASIA-FASD is a small face anti-spoofing dataset containing 50 subjects.
37 PAPERS • NO BENCHMARKS YET
The color FERET database is a dataset for face recognition. It contains 11,338 color images of size 512×768 pixels captured in a semi-controlled environment with 13 different poses from 994 subjects.
34 PAPERS • 3 BENCHMARKS
UMDFaces is a face dataset divided into two parts:
30 PAPERS • NO BENCHMARKS YET
CelebA-Spoof is a large-scale face anti-spoofing dataset with the following properties:
26 PAPERS • NO BENCHMARKS YET
TinyFace is a large scale face recognition benchmark to facilitate the investigation of natively LRFR (Low Resolution Face Recognition) at large scales (large gallery population sizes) in deep learning. The TinyFace dataset consists of 5,139 labelled facial identities given by 169,403 native LR face images (average 20×16 pixels) designed for 1:N recognition test. All the LR faces in TinyFace are collected from public web data across a large variety of imaging scenarios, captured under uncontrolled viewing conditions in pose, illumination, occlusion and background.
23 PAPERS • 1 BENCHMARK
IMDb-Face is large-scale noise-controlled dataset for face recognition research. The dataset contains about 1.7 million faces, 59k identities, which is manually cleaned from 2.0 million raw images. All images are obtained from the IMDb website.
21 PAPERS • NO BENCHMARKS YET
The Replay-Mobile Database for face spoofing consists of 1190 video clips of photo and video attack attempts to 40 clients, under different lighting conditions. These videos were recorded with current devices from the market -- an iPad Mini2 (running iOS) and a LG-G4 smartphone (running Android). This Database was produced at the Idiap Research Institute (Switzerland) within the framework of collaboration with Galician Research and Development Center in Advanced Telecommunications - Gradiant (Spain).
18 PAPERS • NO BENCHMARKS YET
WebFace260M is a million-scale face benchmark, which is constructed for the research community towards closing the data gap behind the industry.
An evaluation protocol for face verification focusing on a large intra-pair image quality difference.
15 PAPERS • 1 BENCHMARK
DigiFace-1M is a synthetic dataset for face recognition, obtained by rendering digital faces using a computer graphics pipeline. It contains 1.22M images of 110K unique identities. The dataset consists of two parts. The first part contains 720K images with 10K identities. For each identity, 4 different sets of accessories are sampled and 18 images are rendered for each set. The second part contains 500K images with 100K identities. For each identity, only one set of accessories is sampled and only 5 images are rendered. Following the format of the existing datasets, we provide the aligned crop around the face, resized into $112 \times 112$ resolution.
14 PAPERS • NO BENCHMARKS YET
Real-World Masked Face Dataset (RMFD) is a large dataset for masked face detection.
11 PAPERS • NO BENCHMARKS YET
MeGlass is an eyeglass dataset originally designed for eyeglass face recognition evaluation. All the face images are selected and cleaned from MegaFace. Each identity has at least two face images with eyeglass and two face images without eyeglass. It contains 47,817 images from 1,710 different identities.
8 PAPERS • NO BENCHMARKS YET
Contains over 11000 images of 1000 identities with different types of disguise accessories. The dataset is collected from the Internet, resulting in unconstrained face images similar to real world settings.
7 PAPERS • 3 BENCHMARKS
Although deep face recognition has achieved impressive results in recent years, there is increasing controversy regarding racial and gender bias of the models, questioning their trustworthiness and deployment into sensitive scenarios. DemogPairs is a validation set with 10.8K facial images and 58.3M identity verification pairs, distributed in demographically-balanced folds of Asian, Black and White females and males. We also propose a benchmark of experiments using DemogPairs over state-of-the-art deep face recognition models in order to analyze their cross-demographic behavior and potential demographic biases (see figure below).
7 PAPERS • NO BENCHMARKS YET
The iCartoonFace dataset is a large-scale dataset that can be used for two different tasks: cartoon face detection and cartoon face recognition.
7 PAPERS • 1 BENCHMARK
ROF is a dataset for occluded face recognition that contains faces with both upper face occlusion, due to sunglasses, and lower face occlusion, due to masks.
5 PAPERS • NO BENCHMARKS YET
The COVID-19 pandemic raises the problem of adapting face recognition systems to the new reality, where people may wear surgical masks to cover their noses and mouths. Traditional data sets (e.g., CelebA, CASIA-WebFace) used for training these systems were released before the pandemic, so they now seem unsuited due to the lack of examples of people wearing masks. We propose a method for enhancing data sets containing faces without masks by creating synthetic masks and overlaying them on faces in the original images. Our method relies on Spark AR Studio, a developer program made by Facebook that is used to create Instagram face filters. In our approach, we use 9 masks of different colors, shapes and fabrics. We employ our method to generate a number of 445,446 (90%) samples of masks for the CASIA-WebFace data set.
4 PAPERS • 1 BENCHMARK
The COVID-19 pandemic raises the problem of adapting face recognition systems to the new reality, where people may wear surgical masks to cover their noses and mouths. Traditional data sets (e.g., CelebA, CASIA-WebFace) used for training these systems were released before the pandemic, so they now seem unsuited due to the lack of examples of people wearing masks. We propose a method for enhancing data sets containing faces without masks by creating synthetic masks and overlaying them on faces in the original images. Our method relies on Spark AR Studio, a developer program made by Facebook that is used to create Instagram face filters. In our approach, we use 9 masks of different colors, shapes and fabrics. We employ our method to generate a number of 196,254 (96.8%) masks for the CelebA data set.
We have cleaned the noisy IMDB-WIKI dataset using a constrained clustering method, resulting this new benchmark for in-the-wild age estimation. The annotations also allow this dataset to use for some other tasks, like gender classification and face recognition/verification. For more details, please refer to our FPAge paper.
3 PAPERS • 1 BENCHMARK
MCXFace is a heterogeneous face recognition dataset consisting of multi-channel image samples for 51 subjects. For each subject color (RGB), thermal, near-infrared (850 nm), short-wave infrared (1300 nm), Depth, Stereo depth, and depth estimated from RGB images are available. Overall 7406 images together with landmark annotations and standard protocols are available in this dataset.
3 PAPERS • NO BENCHMARKS YET
CASIA-Face-Africa is a face image database which contains 38,546 images of 1,183 African subjects. Multi-spectral cameras are utilized to capture the face images under various illumination settings. Demographic attributes and facial expressions of the subjects are also carefully recorded. For landmark detection, each face image in the database is manually labeled with 68 facial keypoints. A group of evaluation protocols are constructed according to different applications, tasks, partitions and scenarios. The proposed database along with its face landmark annotations, evaluation protocols and preliminary results form a good benchmark to study the essential aspects of face biometrics for African subjects, especially face image preprocessing, face feature analysis and matching, facial expression recognition, sex/age estimation, ethnic classification, face image generation, etc.
2 PAPERS • NO BENCHMARKS YET
Celeb-HQ Face Gender Recognition Dataset
Celeb-HQ Facial Identity Recognition Dataset
KANFace consists of 40K still images and 44K sequences (14.5M video frames in total) captured in unconstrained, real-world conditions from 1,045 subjects. The dataset is manually annotated in terms of identity, exact age, gender and kinship.
2 PAPERS • 1 BENCHMARK
Comprised of real human and wax figure images and videos that endorse the problem of face spoofing detection. The dataset consists of more than 1800 face images and 110 videos of 55 people/waxworks, arranged in training, validation and test sets with a large range in expression, illumination and pose variations.
"The Chicago Face Database was developed at the University of Chicago by Debbie S. Ma, Joshua Correll, and Bernd Wittenbrink. The CFD is intended for use in scientific research. It provides high-resolution, standardized photographs of male and female faces of varying ethnicity between the ages of 17-65. Extensive norming data are available for each individual model. These data include both physical attributes (e.g., face size) as well as subjective ratings by independent judges (e.g., attractiveness).
1 PAPER • NO BENCHMARKS YET
The Curated AFD dataset is a curated version of the Asian Face Dataset (AFD) for face recognition research. The original AFD dataset has been curated to remove wrong identity labels, duplicate images and duplicate subjects.
The proposed Extended-YouTube Faces (E-YTF) is an extension of the famous YouTube Faces (YTF) dataset and is specifically designed to further push the challenges of face recognition by addressing the problem of open-set face identification from heterogeneous data i.e. still images vs video.
FAD is a dataset that have roughly 200,000 attribute labels for the above traits, for over 10,000 facial images.
Indian Masked faces in the wild Database is collected into three sets:(i) Indian Celebrity, (ii) Instagram and (iii) Indian Crowd. The Indian Celebrity contains 40 Indian celebrities with 435 images, including Bollywood actors/actresses, television stars, sports personalities, and politicians. The Instagram set contains 377 images of 40 subjects downloaded from Instagram. We collected masked and non-masked images of Indian people with a public profile. The Indian Crowd set is collected from the common people who volunteered to contribute to the dataset. This set contains 120 subjects with 562 images. All the Images are collected in both constrained and unconstrained environments with variation in pose, illumination, background and masks worn by the people.
Consists of a large number of unconstrained multi-view and partially occluded faces.
WildestFaces is tailored to study cross-domain recognition under a variety of adverse conditions.
The scales of the data accessible through internet search engines can reach hundreds of millions, or even billions. The existence of such large weak-labeled databases has gained importance in the training of face recognition algorithms. Starting with the publicly available YFCC100M, we propose a weakly-labeled subset for multi-label face recognition for self-supervised methods. A 392K image subset of YFCC100M of 128x128 images was obtained by querying for the 40 facial attributes. We made this dataset publicly available.
Description: 1,995 People Face Images Data (Asian race). For each subject, more than 20 images per person with frontal face were collected. This data can be used for face recognition and other tasks.
0 PAPER • NO BENCHMARKS YET
Description: 23 Pairs of Identical Twins Face Image Data. The collecting scenes includes indoor and outdoor scenes. The subjects are Chinese males and females. The data diversity inlcudes multiple face angles, multiple face postures, close-up of eyes, multiple light conditions and multiple age groups. This dataset can be used for tasks such as twins' face recognition.
Description: 5,011 Images – Human Frontal face Data (Male). The data diversity includes multiple scenes, multiple ages and multiple races. This dataset includes 2,004 Caucasians , 3,007 Asians. This dataset can be used for tasks such as face detection, race detection, age detection, beard category classification.