MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images

14 Jul 2022  ·  Nasir Hayat, Krzysztof J. Geras, Farah E. Shamout ·

Multi-modal fusion approaches aim to integrate information from different data sources. Unlike natural datasets, such as in audio-visual applications, where samples consist of "paired" modalities, data in healthcare is often collected asynchronously. Hence, requiring the presence of all modalities for a given sample is not realistic for clinical tasks and significantly limits the size of the dataset during training. In this paper, we propose MedFuse, a conceptually simple yet promising LSTM-based fusion module that can accommodate uni-modal as well as multi-modal input. We evaluate the fusion method and introduce new benchmark results for in-hospital mortality prediction and phenotype classification, using clinical time-series data in the MIMIC-IV dataset and corresponding chest X-ray images in MIMIC-CXR. Compared to more complex multi-modal fusion strategies, MedFuse provides a performance improvement by a large margin on the fully paired test set. It also remains robust across the partially paired test set containing samples with missing chest X-ray images. We release our code for reproducibility and to enable the evaluation of competing models in the future.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Phenotype classification MIMIC-CXR, MIMIC-IV MedFuse (optimal) AUROC 0.77 # 1

Methods