Accurate clinical and biomedical Named entity recognition at scale

Software Impacts 2022  ·  Kocaman, Veysel; Talby, David ·

We introduce an agile, production-grade clinical and biomedical Named entity recognition (NER) algorithm based on a modified BiLSTM-CNN-Char DL architecture built on top of Apache Spark. Our NER implementation establishes new state-of-the-art accuracy on 7 of 8 well-known biomedical NER benchmarks and 3 clinical concept extraction challenges: 2010 i2b2/VA clinical concept extraction, 2014 n2c2 de-identification, and 2018 n2c2 medication extraction. Moreover, clinical NER models trained using this implementation outperform the accuracy of commercial entity extraction solutions, AWS Medical Comprehend and Google Cloud Healthcare API by a large margin (8.9% and 6.7% respectively), without using memory-intensive language models.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Named Entity Recognition (NER) AnatEM BertForTokenClassification (Spark NLP) F1 91.65 # 5
Named Entity Recognition (NER) BC4CHEMD BertForTokenClassification (Spark NLP) F1 94.39 # 5
Named Entity Recognition (NER) BC5CDR BertForTokenClassification (Spark NLP) F1 90.89 # 5
Named Entity Recognition (NER) BioNLP13-CG BertForTokenClassification (Spark NLP) F1 87.83 # 2
Named Entity Recognition (NER) Species800 BertForTokenClassification (Spark NLP) F1 82.59 # 2

Methods


No methods listed for this paper. Add relevant methods here