The CheXpert dataset contains 224,316 chest radiographs of 65,240 patients with both frontal and lateral views available. The task is to do automated chest x-ray interpretation, featuring uncertainty labels and radiologist-labeled reference standard evaluation sets.
500 PAPERS • 1 BENCHMARK
ChestX-ray14 is a medical imaging dataset which comprises 112,120 frontal-view X-ray images of 30,805 (collected from the year of 1992 to 2015) unique patients with the text-mined fourteen common disease labels, mined from the text radiological reports via NLP techniques. It expands on ChestX-ray8 by adding six additional thorax diseases: Edema, Emphysema, Fibrosis, Pleural Thickening and Hernia.
204 PAPERS • 5 BENCHMARKS
MIMIC-CXR from Massachusetts Institute of Technology presents 371,920 chest X-rays associated with 227,943 imaging studies from 65,079 patients. The studies were performed at Beth Israel Deaconess Medical Center in Boston, MA.
156 PAPERS • 2 BENCHMARKS
For each dataset we provide a short description as well as some characterization metrics. It includes the number of instances (m), number of attributes (d), number of labels (q), cardinality (Card), density (Dens), diversity (Div), average Imbalance Ratio per label (avgIR), ratio of unconditionally dependent label pairs by chi-square test (rDep) and complexity, defined as m × q × d as in [Read 2010]. Cardinality measures the average number of labels associated with each instance, and density is defined as cardinality divided by the number of labels. Diversity represents the percentage of labelsets present in the dataset divided by the number of possible labelsets. The avgIR measures the average degree of imbalance of all labels, the greater avgIR, the greater the imbalance of the dataset. Finally, rDep measures the proportion of pairs of labels that are dependent at 99% confidence. A broader description of all the characterization metrics and the used partition methods are described in
4 PAPERS • NO BENCHMARKS YET