Multimodal Deep Learning

66 papers with code • 1 benchmarks • 17 datasets

Multimodal deep learning is a type of deep learning that combines information from multiple modalities, such as text, image, audio, and video, to make more accurate and comprehensive predictions. It involves training deep neural networks on data that includes multiple types of information and using the network to make predictions based on this combined data.

One of the key challenges in multimodal deep learning is how to effectively combine information from multiple modalities. This can be done using a variety of techniques, such as fusing the features extracted from each modality, or using attention mechanisms to weight the contribution of each modality based on its importance for the task at hand.

Multimodal deep learning has many applications, including image captioning, speech recognition, natural language processing, and autonomous vehicles. By combining information from multiple modalities, multimodal deep learning can improve the accuracy and robustness of models, enabling them to perform better in real-world scenarios where multiple types of information are present.

Benchmarks

Add a Result

These leaderboards are used to track progress in Multimodal Deep Learning

Trend	Dataset	Best Model	Paper	Code	Compare
	CUB-200-2011	Two Branch Network (Text - Bert + Image - Nts-Net)			See all

Datasets

Subtasks

Multimodal Text and Image Classification

Latest papers with no code

Most implemented Social Latest No code

Describe-and-Dissect: Interpreting Neurons in Vision Networks with Language Models

no code yet • 20 Mar 2024

In this paper, we propose Describe-and-Dissect (DnD), a novel method to describe the roles of hidden neurons in vision networks.

Paper
Add Code

Integrating Wearable Sensor Data and Self-reported Diaries for Personalized Affect Forecasting

no code yet • 16 Mar 2024

Emotional states, as indicators of affect, are pivotal to overall health, making their accurate prediction before onset crucial.

Paper
Add Code

A Multimodal Intermediate Fusion Network with Manifold Learning for Stress Detection

no code yet • 12 Mar 2024

We compared various dimensionality reduction techniques for different variations of unimodal and multimodal networks.

Paper
Add Code

Multimodal deep learning approach to predicting neurological recovery from coma after cardiac arrest

no code yet • 9 Mar 2024

This work showcases our team's (The BEEGees) contributions to the 2023 George B. Moody PhysioNet Challenge.

Paper
Add Code

Multimodal Learning To Improve Cardiac Late Mechanical Activation Detection From Cine MR Images

no code yet • 28 Feb 2024

This paper presents a multimodal deep learning framework that utilizes advanced image techniques to improve the performance of clinical analysis heavily dependent on routinely acquired standard images.

Paper
Add Code

Multimodal Deep Learning of Word-of-Mouth Text and Demographics to Predict Customer Rating: Handling Consumer Heterogeneity in Marketing

no code yet • 22 Jan 2024

However, a number of consumers today usually post their evaluation on the specific product on the online platform, which can be the valuable source of such unobservable differences among consumers.

Paper
Add Code

Multimodal Urban Areas of Interest Generation via Remote Sensing Imagery and Geographical Prior

no code yet • 12 Jan 2024

Unlike conventional AOI generation methods, such as the Road-cut method that segments road networks at various levels, our approach diverges from semantic segmentation algorithms that depend on pixel-level classification.

Paper
Add Code

Predicting the Skies: A Novel Model for Flight-Level Passenger Traffic Forecasting

no code yet • 7 Jan 2024

This study introduces a novel, multimodal deep learning approach to the challenge of predicting flight-level passenger traffic, yielding substantial accuracy improvements compared to traditional models.

Paper
Add Code

Multimodal self-supervised learning for lesion localization

no code yet • 3 Jan 2024

Multimodal deep learning utilizing imaging and diagnostic reports has made impressive progress in the field of medical imaging diagnostics, demonstrating a particularly strong capability for auxiliary diagnosis in cases where sufficient annotation information is lacking.

Paper
Add Code

Integrating Chemical Language and Molecular Graph in Multimodal Fused Deep Learning for Drug Property Prediction

no code yet • 29 Dec 2023

The advantage of the multimodal model lies in its ability to process diverse sources of data using proper models and suitable fusion methods, which would enhance the noise resistance of the model while obtaining data diversity.

Paper
Add Code

Multimodal Deep Learning

Benchmarks Add a Result

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result