Development and Testing of a Novel Large Language Model-Based Clinical Decision Support Systems for Medication Safety in 12 Clinical Specialties

Importance: We introduce a novel Retrieval Augmented Generation (RAG)-Large Language Model (LLM) framework as a Clinical Decision Support Systems (CDSS) to support safe medication prescription. Objective: To evaluate the efficacy of LLM-based CDSS in correctly identifying medication errors in different patient case vignettes from diverse medical and surgical sub-disciplines, against a human expert panel derived ground truth. We compared performance for under 2 different CDSS practical healthcare integration modalities: LLM-based CDSS alone (fully autonomous mode) vs junior pharmacist + LLM-based CDSS (co-pilot, assistive mode). Design, Setting, and Participants: Utilizing a RAG model with state-of-the-art medically-related LLMs (GPT-4, Gemini Pro 1.0 and Med-PaLM 2), this study used 61 prescribing error scenarios embedded into 23 complex clinical vignettes across 12 different medical and surgical specialties. A multidisciplinary expert panel assessed these cases for Drug-Related Problems (DRPs) using the PCNE classification and graded severity / potential for harm using revised NCC MERP medication error index. We compared. Results RAG-LLM performed better compared to LLM alone. When employed in a co-pilot mode, accuracy, recall, and F1 scores were optimized, indicating effectiveness in identifying moderate to severe DRPs. The accuracy of DRP detection with RAG-LLM improved in several categories but at the expense of lower precision. Conclusions This study established that a RAG-LLM based CDSS significantly boosts the accuracy of medication error identification when used alongside junior pharmacists (co-pilot), with notable improvements in detecting severe DRPs. This study also illuminates the comparative performance of current state-of-the-art LLMs in RAG-based CDSS systems.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods