Semi-supervised Ensemble Learning with Weak Supervision for Biomedical Relationship Extraction

AKBC 2019 · Antonios Minas Krasakis, Evangelos Kanoulas, George Tsatsaronis ·

Natural language understanding research has recently shifted towards complex Machine Learning and Deep Learning algorithms. Such models often outperform their simpler counterparts significantly. However, their performance relies on the availability of large amounts of labeled data, which are rarely available. To tackle this problem, we propose a methodology for extending training datasets to arbitrarily big sizes and training complex, data-hungry models using weak supervision. We apply this methodology on biomedical relation extraction, a task where training datasets are excessively time-consuming and expensive to create, yet has a major impact on downstream applications such as drug discovery. We demonstrate in two small-scale controlled experiments that our method consistently enhances the performance of an LSTM network, with performance improvements comparable to hand-labeled training data. Finally, we discuss the optimal setting for applying weak supervision using this methodology.

PDF Abstract