Uncovering Coresets for Classification With Multi-Objective Evolutionary Algorithms

20 Feb 2020  ยท  Pietro Barbiero, Giovanni Squillero, Alberto Tonda ยท

A coreset is a subset of the training set, using which a machine learning algorithm obtains performances similar to what it would deliver if trained over the whole original data. Coreset discovery is an active and open line of research as it allows improving training speed for the algorithms and may help human understanding the results. Building on previous works, a novel approach is presented: candidate corsets are iteratively optimized, adding and removing samples. As there is an obvious trade-off between limiting training size and quality of the results, a multi-objective evolutionary algorithm is used to minimize simultaneously the number of points in the set and the classification error. Experimental results on non-trivial benchmarks show that the proposed approach is able to deliver results that allow a classifier to obtain lower error and better ability of generalizing on unseen data than state-of-the-art coreset discovery techniques.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Core set discovery Abalone EvoCore F1(10-fold) 18.6 # 1
Core set discovery Amazon-employee-access EvoCore F1(10-fold) 91.5 # 1
Core set discovery Credit-g EvoCore F1(10-fold) 74.3 # 1
Core set discovery Electricity EvoCore F1(10-fold) 69.3 # 1
Core set discovery Glass identification EvoCore F1(10-fold) 64.3 # 1
Core set discovery ISOLET EvoCore F1(10-fold) 90.5 # 1
Core set discovery JM1 EvoCore F1(10-fold) 77.1 # 1
Core set discovery Kr-vs-kp EvoCore F1(10-fold) 93.7 # 1
Core set discovery Letter EvoCore F1(10-fold) 65.9 # 1
Core set discovery micro-mass EvoCore F1(10-fold) 83.9 # 1
Core set discovery MNIST EvoCore F1(10-fold) 77.2 # 1
Core set discovery Mozilla4 EvoCore F1(10-fold) 91.2 # 1
Core set discovery Soybean EvoCore F1(10-fold) 91.1 # 1
Core set discovery UCI GAS EvoCore F1(10-fold) 94.6 # 1

Methods