A labeled benchmark dataset for training machine learning models to statically detect malicious Windows portable executable files. The dataset includes features extracted from 1.1M binary files: 900K training samples (300K malicious, 300K benign, 300K unlabeled) and 200K test samples (100K malicious, 100K benign).
96 PAPERS • NO BENCHMARKS YET
The KIT Motion-Language is a dataset linking human motion and natural language.
30 PAPERS • 2 BENCHMARKS
Useful for through two applications - automatic readability assessment and automatic text simplification. The corpus consists of 189 texts, each in three versions (567 in total).
27 PAPERS • NO BENCHMARKS YET
A novel large dataset of social media posts from users with one or multiple mental health conditions along with matched control users.
16 PAPERS • NO BENCHMARKS YET
NCBI Datasets is a valuable resource that simplifies the process of gathering data from various NCBI databases. Whether you’re a researcher, scientist, or bioinformatician, NCBI Datasets provides an efficient way to access sequence information, annotations, and metadata for genes and genomes.
10 PAPERS • NO BENCHMARKS YET