Revisiting Machine Learning from Crowds a Mixture Model for Grouping Annotations

Lecture Notes in Computer Science 2019 · Francisco Mena, Ricardo Ñanculef ·

Today, supervised learning is widely used for pattern recognition, computer vision and other tasks. In this setting, data need to be explicitly annotated. Unfortunately, obtaining accurate labels can be difficult, expensive and time-consuming. As a result, many machine learning projects rely on labelling processes that involve crowds, i.e. multiple subjective and inexpert annotators. Handling this noise in a principled way is an important challenge for machine learning, called learning from crowds. In this paper, we present a model that learns patterns of label noise by grouping annotations. In contrast to previous art, we do not model specific labeling patterns for each annotator but explain the data using a fixed-size mixture model. This approach allows to handle a sparse distribution of labels among annotators and obtain a model with less parameters that can scale better to large-scale scenarios. Experiments on real and simulated data illustrate the advantages of our approach.

PDF Abstract