TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Emotion Recognition in Conversation	IEMOCAP	CFN-ESA	Weighted-F1	71.04	# 8
Emotion Recognition in Conversation	IEMOCAP	CFN-ESA	Accuracy	70.78	# 5
Emotion Recognition in Conversation	MELD	CFN-ESA	Weighted-F1	66.70	# 11
Emotion Recognition in Conversation	MELD	CFN-ESA	Accuracy	67.85	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cfn-esa-a-cross-modal-fusion-network-with/emotion-recognition-in-conversation-on)](https://paperswithcode.com/sota/emotion-recognition-in-conversation-on?p=cfn-esa-a-cross-modal-fusion-network-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cfn-esa-a-cross-modal-fusion-network-with/emotion-recognition-in-conversation-on-meld)](https://paperswithcode.com/sota/emotion-recognition-in-conversation-on-meld?p=cfn-esa-a-cross-modal-fusion-network-with)`

CFN-ESA: A Cross-Modal Fusion Network with Emotion-Shift Awareness for Dialogue Emotion Recognition

28 Jul 2023 · Jiang Li, XiaoPing Wang, Yingjian Liu, Zhigang Zeng ·

Multimodal emotion recognition in conversation (ERC) has garnered growing attention from research communities in various fields. In this paper, we propose a Cross-modal Fusion Network with Emotion-Shift Awareness (CFN-ESA) for ERC. Extant approaches employ each modality equally without distinguishing the amount of emotional information in these modalities, rendering it hard to adequately extract complementary information from multimodal data. To cope with this problem, in CFN-ESA, we treat textual modality as the primary source of emotional information, while visual and acoustic modalities are taken as the secondary sources. Besides, most multimodal ERC models ignore emotion-shift information and overfocus on contextual information, leading to the failure of emotion recognition under emotion-shift scenario. We elaborate an emotion-shift module to address this challenge. CFN-ESA mainly consists of unimodal encoder (RUME), cross-modal encoder (ACME), and emotion-shift module (LESM). RUME is applied to extract conversation-level contextual emotional cues while pulling together data distributions between modalities; ACME is utilized to perform multimodal interaction centered on textual modality; LESM is used to model emotion shift and capture emotion-shift information, thereby guiding the learning of the main task. Experimental results demonstrate that CFN-ESA can effectively promote performance for ERC and remarkably outperform state-of-the-art models.

PDF Abstract

Code

Add Remove Mark official

lijfrank-open/CFN-ESA official

Tasks

Add Remove

Emotion Recognition

Emotion Recognition in Conversation

Multimodal Emotion Recognition

Datasets

IEMOCAP

MELD

Results from the Paper

Edit

Ranked #8 on Emotion Recognition in Conversation on IEMOCAP

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Emotion Recognition in Conversation	IEMOCAP	CFN-ESA	Weighted-F1	71.04	# 8	Compare
Emotion Recognition in Conversation	IEMOCAP	CFN-ESA	Accuracy	70.78	# 5	Compare
Emotion Recognition in Conversation	MELD	CFN-ESA	Weighted-F1	66.70	# 11	Compare
Emotion Recognition in Conversation	MELD	CFN-ESA	Accuracy	67.85	# 3	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

CFN-ESA: A Cross-Modal Fusion Network with Emotion-Shift Awareness for Dialogue Emotion Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove