Paper

Data Generators for Learning Systems Based on RBF Networks

There are plenty of problems where the data available is scarce and expensive. We propose a generator of semi-artificial data with similar properties to the original data which enables development and testing of different data mining algorithms and optimization of their parameters. The generated data allow a large scale experimentation and simulations without danger of overfitting. The proposed generator is based on RBF networks, which learn sets of Gaussian kernels. These Gaussian kernels can be used in a generative mode to generate new data from the same distributions. To assess quality of the generated data we evaluated the statistical properties of the generated data, structural similarity and predictive similarity using supervised and unsupervised learning techniques. To determine usability of the proposed generator we conducted a large scale evaluation using 51 UCI data sets. The results show a considerable similarity between the original and generated data and indicate that the method can be useful in several development and simulation scenarios. We analyze possible improvements in classification performance by adding different amounts of generated data to the training set, performance on high dimensional data sets, and conditions when the proposed approach is successful.

Results in Papers With Code
(↓ scroll down to see all results)