Semantic Enrichment of Pretrained Embedding Output for Unsupervised IR

The rapid growth of scientific literature in the biomedical and clinical domain has significantly com- plicated the identification of information of interest by researchers as well as other practitioners. More importantly, the rapid emergence of new topics and findings, often hinders the performance of super- vised approaches, due to the lack of relevant annotated data. The global COVID-19 pandemic further highlighted the need to query and navigate uncharted ground in the scientific literature in a prompt and efficient way. In this paper we investigate the potential of semantically enhancing deep transformer architectures using SNOMED-CT in order to answer user queries in an unsupervised manner. Our proposed system attempts to filter and re-rank documents related to a query that were initially retrieved using BERT models. To achieve that, we enhance queries and documents with SNOMED-CT concepts and then im- pose filters on concept co-occurrence between them. We evaluate this approach on OHSUMED dataset and show competitive performance and we also present our approach for adapting such an approach to full papers, such as kaggle’s CORD-19 full-text dataset challenge.

PDF

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Information Retrieval Ohsumed BERT+CONCEPT FILTER NDCG 0.25 # 1

Methods


No methods listed for this paper. Add relevant methods here