No means ‘No’; a non-im-proper modeling approach, with embedded speculative context

Motivation: The medical data are complex in nature as terms that appear in records usually appear in different contexts. Through this paper, we investigate various bio model’s embeddings(BioBERT, BioE- LECTRA, PubMedBERT) on their understanding of "negation and speculation context" wherein we found that these models were unable to differentiate "negated context" vs "non-negated context". To measure the understanding of models, we used cosine similarity scores of negated sentence embeddings vs non- negated sentence embeddings pairs. For improving these models, we introduce a generic super tuning approach to enhance the embeddings on "negation and speculation context" by utilizing a synthesized dataset. Results: After super-tuning the models we can see that the model’s embeddings are now understanding negative and speculative contexts much better. Furthermore, we fine-tuned the super tuned models on various tasks and we found that the model has outperformed the previous models and achieved state- of-the-art (SOTA) on negation, speculation cue, and scope detection tasks on BioScope abstracts and Sherlock dataset. We also confirmed that our approach had a very minimal trade-off in the performance of the model in other tasks like Natural Language Inference after super-tuning.

PDF
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Speculation Scope Resolution BioScope : Abstracts NegBioELECTRA F1 98.37 # 1
Negation Scope Resolution BioScope : Abstracts NegBioELECTRA F1 98.94 # 1
Negation and Speculation Cue Detection BioScope : Abstracts NegBioELECTRA F1 99.02 # 1
Negation and Speculation Cue Detection *sem 2012 Shared Task: Sherlock Dataset NegBioELECTRA F1 99.56 # 1
Negation Scope Resolution *sem 2012 Shared Task: Sherlock Dataset NegBioELECTRA F1 97.26 # 1

Methods