no code implementations • 16 Feb 2022 • Enrico Dandolo, Andrea Pietracaprina, Geppino Pucci
A more general formulation, known as k-means with $z$ outliers, introduced to deal with noisy datasets, features a further parameter $z$ and allows up to $z$ points of $P$ (outliers) to be disregarded when computing the aforementioned sum.
1 code implementation • 7 Jan 2022 • Paolo Pellizzoni, Andrea Pietracaprina, Geppino Pucci
We provide efficient algorithms for this important variant in the streaming model under the sliding window setting, where, at each time step, the dataset to be clustered is the window $W$ of the most recent data items.
1 code implementation • 3 Mar 2020 • Federico Altieri, Andrea Pietracaprina, Geppino Pucci, Fabio Vandin
The experiments provide evidence that, unlike other heuristics, our estimation strategy not only provides tight theoretical guarantees but is also able to return highly accurate estimations while running in a fraction of the time required by the exact computation, and that its distributed implementation is highly scalable, thus enabling the computation of internal measures for very large datasets for which the exact computation is prohibitive.
no code implementations • 18 Feb 2020 • Andrea Pietracaprina, Geppino Pucci, Federico Soldà
Given a dataset $V$ of points from some metric space, the popular $k$-center problem requires to identify a subset of $k$ points (centers) in $V$ minimizing the maximum distance of any point of $V$ from its closest center.
1 code implementation • 18 May 2016 • Matteo Ceccarello, Andrea Pietracaprina, Geppino Pucci, Eli Upfal
Given a dataset of points in a metric space and an integer $k$, a diversity maximization problem requires determining a subset of $k$ points maximizing some diversity objective measure, e. g., the minimum or the average distance between two points in the subset.
Distributed, Parallel, and Cluster Computing