The CheXpert dataset contains 224,316 chest radiographs of 65,240 patients with both frontal and lateral views available. The task is to do automated chest x-ray interpretation, featuring uncertainty labels and radiologist-labeled reference standard evaluation sets.
510 PAPERS • 1 BENCHMARK
The LUNA16 (LUng Nodule Analysis) dataset is a dataset for lung segmentation. It consists of 1,186 lung nodules annotated in 888 CT scans.
83 PAPERS • 1 BENCHMARK
LiTS17 is a liver tumor segmentation benchmark. The data and segmentations are provided by various clinical sites around the world. The training data set contains 130 CT scans and the test data set 70 CT scans. Image Source: https://arxiv.org/pdf/1707.07734.pdf
39 PAPERS • 3 BENCHMARKS
CVC-ClinicDB is an open-access dataset of 612 images with a resolution of 384×288 from 31 colonoscopy sequences.It is used for medical image segmentation, in particular polyp detection in colonoscopy videos.
38 PAPERS • 1 BENCHMARK
SLAKE is an English-Chinese bilingual dataset consisting of 642 images and 14,028 question-answer pairs for training and testing Med-VQA systems.
30 PAPERS • 1 BENCHMARK
The ISIC 2018 dataset was published by the International Skin Imaging Collaboration (ISIC) as a large-scale dataset of dermoscopy images. This Task 1 dataset is the challenge on lesion segmentation. It includes 2594 images.
22 PAPERS • 1 BENCHMARK
The MS-CXR dataset provides 1162 image–sentence pairs of bounding boxes and corresponding phrases, collected across eight different cardiopulmonary radiological findings, with an approximately equal number of pairs for each finding. This dataset complements the existing MIMIC-CXR v.2 dataset and comprises: 1. Reviewed and edited bounding boxes and phrases (1026 pairs of bounding box/sentence); and 2. Manual bounding box labels from scratch (136 pairs of bounding box/sentence).e
22 PAPERS • NO BENCHMARKS YET
Chaoyang dataset contains 1111 normal, 842 serrated, 1404 adenocarcinoma, 664 adenoma, and 705 normal, 321 serrated, 840 adenocarcinoma, 273 adenoma samples for training and testing, respectively. This noisy dataset is constructed in the real scenario.
12 PAPERS • 2 BENCHMARKS
Chinese Medical Named Entity Recognition, a dataset first released in CHIP20204, is used for CMeEE task. Given a pre-defined schema, the task is to identify and extract entities from the given sentence and classify them into nine categories: disease, clinical manifestations, drugs, medical equipment, medical procedures, body, medical examinations, microorganisms, and department.
8 PAPERS • 1 BENCHMARK
The MedDialog dataset (Chinese) contains conversations (in Chinese) between doctors and patients. It has 1.1 million dialogues and 4 million utterances. The data is continuously growing and more dialogues will be added. The raw dialogues are from haodf.com. All copyrights of the data belong to haodf.com.
6 PAPERS • NO BENCHMARKS YET
MCSCSet is a large-scale specialist-annotated dataset, designed for the task of Medical-domain Chinese Spelling Correction that contains about 200k samples. MCSCSet involves: i) extensive real-world medical queries collected from Tencent Yidian, ii) corresponding misspelled sentences manually annotated by medical specialists.
2 PAPERS • NO BENCHMARKS YET
Chinese Medical Information Extraction, a dataset that is also released in CHIP2020, is used for CMeIE task. The task is aimed at identifying both entities and relations in a sentence following the schema constraints. There are 53 relations defined in the dataset, including 10 synonymous sub-relationships and 43 other sub-relationships.
1 PAPER • 1 BENCHMARK
Hypertention Disease Medication dataset.
1 PAPER • NO BENCHMARKS YET
Click to add a brief description of the dataset (Markdown and LaTeX enabled).