🔔 Share your dataset with the ML community!

Filter by Modality (clear)

Filter by Task

Filter by Language (clear)

14 dataset results for Medical AND Chinese

The CheXpert dataset contains 224,316 chest radiographs of 65,240 patients with both frontal and lateral views available. The task is to do automated chest x-ray interpretation, featuring uncertainty labels and radiologist-labeled reference standard evaluation sets.

510 PAPERS • 1 BENCHMARK

LUNA16

The LUNA16 (LUng Nodule Analysis) dataset is a dataset for lung segmentation. It consists of 1,186 lung nodules annotated in 888 CT scans.

83 PAPERS • 1 BENCHMARK

LiTS17 (Liver Tumor Segmentation Challenge 2017)

LiTS17 is a liver tumor segmentation benchmark. The data and segmentations are provided by various clinical sites around the world. The training data set contains 130 CT scans and the test data set 70 CT scans. Image Source: https://arxiv.org/pdf/1707.07734.pdf

39 PAPERS • 3 BENCHMARKS

CVC-ClinicDB

CVC-ClinicDB is an open-access dataset of 612 images with a resolution of 384×288 from 31 colonoscopy sequences.It is used for medical image segmentation, in particular polyp detection in colonoscopy videos.

38 PAPERS • 1 BENCHMARK

SLAKE

SLAKE is an English-Chinese bilingual dataset consisting of 642 images and 14,028 question-answer pairs for training and testing Med-VQA systems.

30 PAPERS • 1 BENCHMARK

ISIC 2018 Task 1

The ISIC 2018 dataset was published by the International Skin Imaging Collaboration (ISIC) as a large-scale dataset of dermoscopy images. This Task 1 dataset is the challenge on lesion segmentation. It includes 2594 images.

22 PAPERS • 1 BENCHMARK

MS-CXR (Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing)

The MS-CXR dataset provides 1162 image–sentence pairs of bounding boxes and corresponding phrases, collected across eight different cardiopulmonary radiological findings, with an approximately equal number of pairs for each finding. This dataset complements the existing MIMIC-CXR v.2 dataset and comprises: 1. Reviewed and edited bounding boxes and phrases (1026 pairs of bounding box/sentence); and 2. Manual bounding box labels from scratch (136 pairs of bounding box/sentence).e

22 PAPERS • NO BENCHMARKS YET

Chaoyang

Chaoyang dataset contains 1111 normal, 842 serrated, 1404 adenocarcinoma, 664 adenoma, and 705 normal, 321 serrated, 840 adenocarcinoma, 273 adenoma samples for training and testing, respectively. This noisy dataset is constructed in the real scenario.

12 PAPERS • 2 BENCHMARKS

CMeEE

CMeEE (Chinese Medical Named Entity Recognition Dataset)

Chinese Medical Named Entity Recognition, a dataset first released in CHIP20204, is used for CMeEE task. Given a pre-defined schema, the task is to identify and extract entities from the given sentence and classify them into nine categories: disease, clinical manifestations, drugs, medical equipment, medical procedures, body, medical examinations, microorganisms, and department.

8 PAPERS • 1 BENCHMARK

MedDialog

The MedDialog dataset (Chinese) contains conversations (in Chinese) between doctors and patients. It has 1.1 million dialogues and 4 million utterances. The data is continuously growing and more dialogues will be added. The raw dialogues are from haodf.com. All copyrights of the data belong to haodf.com.

6 PAPERS • NO BENCHMARKS YET

MCSCSet

MCSCSet is a large-scale specialist-annotated dataset, designed for the task of Medical-domain Chinese Spelling Correction that contains about 200k samples. MCSCSet involves: i) extensive real-world medical queries collected from Tencent Yidian, ii) corresponding misspelled sentences manually annotated by medical specialists.

2 PAPERS • NO BENCHMARKS YET

CMeIE

CMeIE (Chinese Medical Information Extraction Dataset)

Chinese Medical Information Extraction, a dataset that is also released in CHIP2020, is used for CMeIE task. The task is aimed at identifying both entities and relations in a sentence following the schema constraints. There are 53 relations defined in the dataset, including 10 synonymous sub-relationships and 43 other sub-relationships.

1 PAPER • 1 BENCHMARK

HTDM

HTDM (Hypertention Disease Medication)

Hypertention Disease Medication dataset.

1 PAPER • NO BENCHMARKS YET

LPBA40

LPBA40 (LONI Probabilistic Brain Atlas)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 PAPER • NO BENCHMARKS YET

Datasets

14 dataset results for Medical AND Chinese