Measuring semantic similarity of clinical trial outcomes using deep pre-trained language representations

Background: Outcomes are variables monitored during a clinical trial to assess the impact of an intervention on humans’ health.Automatic assessment of semantic similarity of trial outcomes is required for a number of tasks, such as detection of outcome switching (unjustified changes of pre-defined outcomes of a trial) and implementation of Core Outcome Sets (minimal sets of outcomes that should be reported in a particular medical domain). Objective: We aimed at building an algorithm for assessing semantic similarity of pairs of primary and reported outcomes.We focused on approaches that do not require manually curated domain-specific resources such as ontologies and thesauri. Methods: We tested several approaches, including single measures of similarity (based on strings, stems and lemmas, paths and distances in an ontology, and vector representations of phrases), classifiers using a combination of single measures as features, and a deep learning approach that consists in fine-tuning pre-trained deep language representations.We tested language models provided by BERT (trained on general-domain texts), BioBERT and SciBERT (trained on biomedical and scientific texts, respectively).We explored the possibility of improving the results by taking into account the variants for referring to an outcome (e.g.the use of a measurement tool name instead on the outcome name; the use of abbreviations).We release an open corpus with annotation for similarity of pairs of outcomes. Results: Classifiers using a combination of single measures as features outperformed the single measures, while deep learning algorithms using BioBERT and SciBERT models outperformed the classifiers.BioBERT reached the best F-measure of 89.75%.The addition of variants of outcomes did not improve the results for the best-performing single measures nor for the classifiers, but it improved the performance of deep learning algorithms: BioBERT achieved an F-measure of93.38%. Conclusions: Deep learning approaches using pre-trained language representations outperformed other approaches for similarity assessment of trial outcomes, without relying on any manually curated domain-specific resources (ontologies and other lexical resources). Addition of variants of outcomes further improved the performance of deep learning algorithms.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Semantic Similarity Annotated corpus for semantic similarity of clinical trial outcomes (expanded corpus) SciBERT uncased (SciVocab, fine-tuned on "Annotated corpus for semantic similarity of clinical trial outcomes, expanded corpus") F1 91.51 # 2
Precision 91.3 # 2
Recall 91.79 # 3
Semantic Similarity Annotated corpus for semantic similarity of clinical trial outcomes (expanded corpus) BERT-Base cased (fine-tuned on "Annotated corpus for semantic similarity of clinical trial outcomes, expanded corpus") F1 89.12 # 5
Precision 88.25 # 5
Recall 90.1 # 4
Semantic Similarity Annotated corpus for semantic similarity of clinical trial outcomes (expanded corpus) BERT-Base uncased (fine-tuned on "Annotated corpus for semantic similarity of clinical trial outcomes, expanded corpus") F1 89.16 # 4
Precision 89.31 # 3
Recall 89.12 # 5
Semantic Similarity Annotated corpus for semantic similarity of clinical trial outcomes (expanded corpus) BioBERT (pre-trained on PubMed abstracts + PMC, fine-tuned on "Annotated corpus for semantic similarity of clinical trial outcomes, expanded corpus") F1 93.38 # 1
Precision 92.98 # 1
Recall 93.85 # 1
Semantic Similarity Annotated corpus for semantic similarity of clinical trial outcomes (expanded corpus) SciBERT cased (SciVocab, fine-tuned on "Annotated corpus for semantic similarity of clinical trial outcomes, expanded corpus") F1 90.69 # 3
Precision 89 # 4
Recall 92.54 # 2
Semantic Similarity Annotated corpus for semantic similarity of clinical trial outcomes (original corpus) BERT-Base cased (fine-tuned on "Annotated corpus for semantic similarity of clinical trial outcomes, original corpus") F1 84.21 # 5
Precision 83.36 # 5
Recall 85.2 # 5
Semantic Similarity Annotated corpus for semantic similarity of clinical trial outcomes (original corpus) BERT-Base uncased (fine-tuned on "Annotated corpus for semantic similarity of clinical trial outcomes, original corpus") F1 86.8 # 4
Precision 85.76 # 4
Recall 88.15 # 4
Semantic Similarity Annotated corpus for semantic similarity of clinical trial outcomes (original corpus) SciBERT cased (SciVocab, fine-tuned on "Annotated corpus for semantic similarity of clinical trial outcomes, original corpus") F1 89.3 # 2
Precision 87.31 # 3
Recall 91.53 # 1
Semantic Similarity Annotated corpus for semantic similarity of clinical trial outcomes (original corpus) BioBERT (pre-trained on PubMed abstracts + PMC, fine-tuned on "Annotated corpus for semantic similarity of clinical trial outcomes, original corpus") F1 89.75 # 1
Precision 88.93 # 1
Recall 90.76 # 3
Semantic Similarity Annotated corpus for semantic similarity of clinical trial outcomes (original corpus) SciBERT uncased (SciVocab, fine-tuned on "Annotated corpus for semantic similarity of clinical trial outcomes, original corpus") F1 89.3 # 2
Precision 87.99 # 2
Recall 90.78 # 2

Methods