Towards Understanding Data Values: Empirical Results on Synthetic Data

29 Sep 2021 · Danilo Brajovic, Omar de Mitri, Alex Windberger, Marco Huber ·

Understanding the influence of data on machine learning models is an emerging research field. Inspired by recent work in data valuation, we perform several experiments to get an intuition for this influence on a multi-layer perceptron. We generate a synthetic two-dimensional data set to visualize how different valuation methods value data points on a mesh grid spanning the relevant feature space. In this setting, individual data values can be derived directly from the impact of the respective data points on the decision boundary. Our results show that the most important data points are the miss-classified ones. Furthermore, despite performance differences on real world data sets, all investigated methods except one qualitatively agree on the data values derived from our experiments. Finally, we place our results into the recent literature and discuss data values and their relationship to other methods.

PDF Abstract