A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis

Understanding expressed sentiment and emotions are two crucial factors in human multimodal language. This paper describes a Transformer-based joint-encoding (TBJE) for the task of Emotion Recognition and Sentiment Analysis. In addition to use the Transformer architecture, our approach relies on a modular co-attention and a glimpse layer to jointly encode one or more modalities. The proposed solution has also been submitted to the ACL20: Second Grand-Challenge on Multimodal Language to be evaluated on the CMU-MOSEI dataset. The code to replicate the presented experiments is open-source: https://github.com/jbdel/MOSEI_UMONS.

PDF Abstract WS 2020 PDF WS 2020 Abstract

Datasets


Results from the Paper


Ranked #5 on Multimodal Sentiment Analysis on CMU-MOSEI (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Multimodal Sentiment Analysis CMU-MOSEI Transformer-based joint-encoding Accuracy 82.48 # 5

Methods