TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Multimodal Emotion Recognition	IEMOCAP	CORECT (4-class)	F1	0.846	# 1
Multimodal Emotion Recognition	IEMOCAP	CORECT (4-class)	Weighted Accuracy (WA)	0.847	# 1
Multimodal Emotion Recognition	IEMOCAP	CORECT (4-class)	Weighted F1	0.846	# 1
Multimodal Emotion Recognition	IEMOCAP	CORECT (6-class)	F1	0.702	# 9
Multimodal Emotion Recognition	IEMOCAP	CORECT (6-class)	Weighted Accuracy (WA)	0.699	# 4
Multimodal Emotion Recognition	IEMOCAP	CORECT (6-class)	Weighted F1	0.702	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/conversation-understanding-using-relational/multimodal-emotion-recognition-on-iemocap)](https://paperswithcode.com/sota/multimodal-emotion-recognition-on-iemocap?p=conversation-understanding-using-relational)`

Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction

8 Nov 2023 · Cam-Van Thi Nguyen, Anh-Tuan Mai, The-Son Le, Hai-Dang Kieu, Duc-Trong Le ·

Emotion recognition is a crucial task for human conversation understanding. It becomes more challenging with the notion of multimodal data, e.g., language, voice, and facial expressions. As a typical solution, the global- and the local context information are exploited to predict the emotional label for every single sentence, i.e., utterance, in the dialogue. Specifically, the global representation could be captured via modeling of cross-modal interactions at the conversation level. The local one is often inferred using the temporal information of speakers or emotional shifts, which neglects vital factors at the utterance level. Additionally, most existing approaches take fused features of multiple modalities in an unified input without leveraging modality-specific representations. Motivating from these problems, we propose the Relational Temporal Graph Neural Network with Auxiliary Cross-Modality Interaction (CORECT), an novel neural network framework that effectively captures conversation-level cross-modality interactions and utterance-level temporal dependencies with the modality-specific manner for conversation understanding. Extensive experiments demonstrate the effectiveness of CORECT via its state-of-the-art results on the IEMOCAP and CMU-MOSEI datasets for the multimodal ERC task.

PDF Abstract

Code

Add Remove Mark official

leson502/CORECT_EMNLP2023 official

Tasks

Add Remove

Emotion Recognition

Multimodal Emotion Recognition

Datasets

IEMOCAP

CMU-MOSEI

Results from the Paper

Add Remove

Ranked #1 on Multimodal Emotion Recognition on IEMOCAP

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Multimodal Emotion Recognition	IEMOCAP	CORECT (4-class)	F1	0.846	# 1	Compare
			Weighted Accuracy (WA)	0.847	# 1	Compare
			Weighted F1	0.846	# 1	Compare
Multimodal Emotion Recognition	IEMOCAP	CORECT (6-class)	F1	0.702	# 9	Compare
			Weighted Accuracy (WA)	0.699	# 4	Compare
			Weighted F1	0.702	# 2	Compare

Methods

Add Remove

Graph Neural Network

Edit Social Preview

Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove