Defining Big Data Analytics Benchmarks for Next Generation Supercomputers

6 Nov 2018  ·  Drew Schmidt, Junqi Yin, Michael Matheson, Bronson Messer, Mallikarjun Shankar ·

The design and construction of high performance computing (HPC) systems relies on exhaustive performance analysis and benchmarking. Traditionally this activity has been geared exclusively towards simulation scientists, who, unsurprisingly, have been the primary customers of HPC for decades. However, there is a large and growing volume of data science work that requires these large scale resources, and as such the calls for inclusion and investments in data for HPC have been increasing. So when designing a next generation HPC platform, it is necessary to have HPC-amenable big data analytics benchmarks. In this paper, we propose a set of big data analytics benchmarks and sample codes designed for testing the capabilities of current and next generation supercomputers.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper