SwellShark: A Generative Model for Biomedical Named Entity Recognition without Labeled Data

20 Apr 2017  ·  Jason Fries, Sen Wu, Alex Ratner, Christopher Ré ·

We present SwellShark, a framework for building biomedical named entity recognition (NER) systems quickly and without hand-labeled data. Our approach views biomedical resources like lexicons as function primitives for autogenerating weak supervision. We then use a generative model to unify and denoise this supervision and construct large-scale, probabilistically labeled datasets for training high-accuracy NER taggers. In three biomedical NER tasks, SwellShark achieves competitive scores with state-of-the-art supervised benchmarks using no hand-labeled training data. In a drug name extraction task using patient medical records, one domain expert using SwellShark achieved within 5.1% of a crowdsourced annotation approach -- which originally utilized 20 teams over the course of several weeks -- in 24 hours.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Weakly-Supervised Named Entity Recognition BC5CDR SwellShark Precision 86.1 # 1
Recall 82.4 # 2
F1 84.2 # 2

Methods


No methods listed for this paper. Add relevant methods here