M2D: A Multi-modal Framework for Automatic Medical Diagnosis

19 Oct 2020 · Raj Ratn Pranesh, Ambesh Shekhar, Sumit Kumar ·

In this paper, we present M2D: a multimodal deep learning framework for automatic medical condition diagnosis via transfer learning. M2D leverages acoustic and textual features extracted from the audio utterance and the corresponding transcription describing a patient’s medical symptoms. Our model utilizes ResNet-34 to learn audio feature via log mel-spectrogram and BioBERT language model to learn textual feature. We conducted a comparative performance analysis of M2D with baseline models based on textual or acoustic feature.

PDF Abstract