TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Emotion Recognition in Conversation	IEMOCAP	MPT-HCL	Weighted-F1	72.51	# 2
Emotion Recognition in Conversation	IEMOCAP	MPT-HCL	Accuracy	72.83	# 2
Emotion Recognition in Conversation	MELD	MPT-HCL	Weighted-F1	65.02	# 30
Emotion Recognition in Conversation	MELD	MPT-HCL	Accuracy	65.86	# 9

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multimodal-prompt-transformer-with-hybrid/emotion-recognition-in-conversation-on)](https://paperswithcode.com/sota/emotion-recognition-in-conversation-on?p=multimodal-prompt-transformer-with-hybrid)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multimodal-prompt-transformer-with-hybrid/emotion-recognition-in-conversation-on-meld)](https://paperswithcode.com/sota/emotion-recognition-in-conversation-on-meld?p=multimodal-prompt-transformer-with-hybrid)`

Multimodal Prompt Transformer with Hybrid Contrastive Learning for Emotion Recognition in Conversation

4 Oct 2023 · Shihao Zou, Xianying Huang, Xudong Shen ·

Emotion Recognition in Conversation (ERC) plays an important role in driving the development of human-machine interaction. Emotions can exist in multiple modalities, and multimodal ERC mainly faces two problems: (1) the noise problem in the cross-modal information fusion process, and (2) the prediction problem of less sample emotion labels that are semantically similar but different categories. To address these issues and fully utilize the features of each modality, we adopted the following strategies: first, deep emotion cues extraction was performed on modalities with strong representation ability, and feature filters were designed as multimodal prompt information for modalities with weak representation ability. Then, we designed a Multimodal Prompt Transformer (MPT) to perform cross-modal information fusion. MPT embeds multimodal fusion information into each attention layer of the Transformer, allowing prompt information to participate in encoding textual features and being fused with multi-level textual information to obtain better multimodal fusion features. Finally, we used the Hybrid Contrastive Learning (HCL) strategy to optimize the model's ability to handle labels with few samples. This strategy uses unsupervised contrastive learning to improve the representation ability of multimodal fusion and supervised contrastive learning to mine the information of labels with few samples. Experimental results show that our proposed model outperforms state-of-the-art models in ERC on two benchmark datasets.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Contrastive Learning

Emotion Recognition

Emotion Recognition in Conversation

Datasets

IEMOCAP

MELD

Results from the Paper

Add Remove

Ranked #2 on Emotion Recognition in Conversation on IEMOCAP

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Emotion Recognition in Conversation	IEMOCAP	MPT-HCL	Weighted-F1	72.51	# 2	Compare
Emotion Recognition in Conversation	IEMOCAP	MPT-HCL	Accuracy	72.83	# 2	Compare
Emotion Recognition in Conversation	MELD	MPT-HCL	Weighted-F1	65.02	# 30	Compare
Emotion Recognition in Conversation	MELD	MPT-HCL	Accuracy	65.86	# 9	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Contrastive Learning • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Multimodal Prompt Transformer with Hybrid Contrastive Learning for Emotion Recognition in Conversation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove