Multimodal Deep Learning

66 papers with code • 1 benchmarks • 17 datasets

Multimodal deep learning is a type of deep learning that combines information from multiple modalities, such as text, image, audio, and video, to make more accurate and comprehensive predictions. It involves training deep neural networks on data that includes multiple types of information and using the network to make predictions based on this combined data.

One of the key challenges in multimodal deep learning is how to effectively combine information from multiple modalities. This can be done using a variety of techniques, such as fusing the features extracted from each modality, or using attention mechanisms to weight the contribution of each modality based on its importance for the task at hand.

Multimodal deep learning has many applications, including image captioning, speech recognition, natural language processing, and autonomous vehicles. By combining information from multiple modalities, multimodal deep learning can improve the accuracy and robustness of models, enabling them to perform better in real-world scenarios where multiple types of information are present.

Benchmarks

Add a Result

These leaderboards are used to track progress in Multimodal Deep Learning

Trend	Dataset	Best Model	Paper	Code	Compare
	CUB-200-2011	Two Branch Network (Text - Bert + Image - Nts-Net)			See all

Datasets

Subtasks

Multimodal Text and Image Classification

Latest papers with no code

Most implemented Social Latest No code

Integrating Chemical Language and Molecular Graph in Multimodal Fused Deep Learning for Drug Property Prediction

no code yet • 29 Dec 2023

The advantage of the multimodal model lies in its ability to process diverse sources of data using proper models and suitable fusion methods, which would enhance the noise resistance of the model while obtaining data diversity.

Paper
Add Code

A graph-based multimodal framework to predict gentrification

no code yet • 25 Dec 2023

Gentrification--the transformation of a low-income urban area caused by the influx of affluent residents--has many revitalizing benefits.

Paper
Add Code

SynthScribe: Deep Multimodal Tools for Synthesizer Sound Retrieval and Exploration

no code yet • 7 Dec 2023

This is achieved with three main features: a multimodal search engine for a large library of synthesizer sounds; a user centered genetic algorithm by which completely new sounds can be created and selected given the users preferences; a sound editing support feature which highlights and gives examples for key control parameters with respect to a text or audio based query.

Paper
Add Code

TextAug: Test time Text Augmentation for Multimodal Person Re-identification

no code yet • 4 Dec 2023

In this study, we investigate the effectiveness of two computer vision data augmentation techniques: cutout and cutmix, for text augmentation in multi-modal person re-identification.

Paper
Add Code

Multimodal deep learning for mapping forest dominant height by fusing GEDI with earth observation data

no code yet • 20 Nov 2023

Consequently, we proposed a novel deep learning framework termed the multi-modal attention remote sensing network (MARSNet) to estimate forest dominant height by extrapolating dominant height derived from GEDI, using Setinel-1 data, ALOS-2 PALSAR-2 data, Sentinel-2 optical data and ancillary data.

Paper
Add Code

Asymmetric Contrastive Multimodal Learning for Advancing Chemical Understanding

no code yet • 11 Nov 2023

Through practical tasks such as isomer discrimination and uncovering crucial chemical properties for drug discovery, ACML exhibits its capability to revolutionize chemical research and applications, providing a deeper understanding of chemical semantics of different modalities.

Paper
Add Code

MalFake: A Multimodal Fake News Identification for Malayalam using Recurrent Neural Networks and VGG-16

no code yet • 27 Oct 2023

Multimodal approaches are more accurate in detecting fake news, as features from multiple modalities are extracted to build the deep learning classification model.

Paper
Add Code

Multimodal Deep Learning for Scientific Imaging Interpretation

no code yet • 21 Sep 2023

Leveraging a multimodal deep learning framework, our approach distills insights from both textual and visual data harvested from peer-reviewed articles, further augmented by the capabilities of GPT-4 for refined data synthesis and evaluation.

Paper
Add Code

A multimodal deep learning architecture for smoking detection with a small data approach

no code yet • 19 Sep 2023

Introduction: Covert tobacco advertisements often raise regulatory measures.

Paper
Add Code

PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization

no code yet • ICCV 2023

In a joint vision-language space, a text feature (e. g., from "a photo of a dog") could effectively represent its relevant image features (e. g., from dog photos).

Paper
Add Code

Multimodal Deep Learning

Benchmarks Add a Result

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result