|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
We introduce a novel framework for evaluating multimodal deep learning models with respect to their language understanding and generalization abilities.
Multimodal emotion recognition from speech is an important area in affective computing.
Combining complementary information from multiple modalities is intuitively appealing for improving the performance of learning-based approaches.
In particular, we also investigate a special case of multi-modality learning (MML) -- cross-modality learning (CML) that exists widely in RS image classification applications.
The goal of score following is to track a musical performance, usually in the form of audio, in a corresponding score representation.
Our work improves on existing multimodal deep learning algorithms in two essential ways: (1) it presents a novel method for performing cross-modality (before features are learned from individual modalities) and (2) extends the previously proposed cross-connections which only transfer information between streams that process compatible data.
In recent years, natural language descriptions are used to obtain information on discriminative parts of the object.
Ranked #1 on Multimodal Deep Learning on CUB-200-2011
Memes on the Internet are often harmless and sometimes amusing.
Ranked #1 on Meme Classification on Hateful Memes (using extra training data)
Emotion Recognition is a challenging research area given its complex nature, and humans express emotional cues across various modalities such as language, facial expressions, and speech.
Multimedia content in social media platforms provides significant information during disaster events.
Ranked #1 on Disaster Response on CrisisMMD