SHAPES is a dataset of synthetic images designed to benchmark systems for understanding of spatial and logical relations among multiple objects. The dataset consists of complex questions about arrangements of colored shapes. The questions are built around compositions of concepts and relations, e.g. Is there a red shape above a circle? or Is a red shape blue?. Questions contain between two and four attributes, object types, or relationships. There are 244 questions and 15,616 images in total, with all questions having a yes and no answer (and corresponding supporting image). This eliminates the risk of learning biases.
66 PAPERS • 1 BENCHMARK
PenDigits is a handwritten digits database created by collecting 250 samples from 44 writers. The samples written by 30 writers are used for training, cross-validation and writer dependent testing, and the digits written by the other 14 are used for writer independent testing.
11 PAPERS • 3 BENCHMARKS
The PhysioNet Challenge 2012 dataset is publicly available and contains the de-identified records of 8000 patients in Intensive Care Units (ICU). Each record consists of roughly 48 hours of multivariate time series data with up to 37 features recorded at various times from the patients during their stay such as respiratory rate, glucose etc.
11 PAPERS • 4 BENCHMARKS
The UCR Time Series Archive - introduced in 2002, has become an important resource in the time series data mining community, with at least one thousand published papers making use of at least one data set from the archive. The original incarnation of the archive had sixteen data sets but since that time, it has gone through periodic expansions. The last expansion took place in the summer of 2015 when the archive grew from 45 to 85 data sets. This paper introduces and will focus on the new data expansion from 85 to 128 data sets. Beyond expanding this valuable resource, this paper offers pragmatic advice to anyone who may wish to evaluate a new algorithm on the archive. Finally, this paper makes a novel and yet actionable claim: of the hundreds of papers that show an improvement over the standard baseline (1-nearest neighbor classification), a large fraction may be misattributing the reasons for their improvement. Moreover, they may have been able to achieve the same improvement with a
9 PAPERS • NO BENCHMARKS YET