TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Visual Storytelling	VIST	AOG + ARS	BLEU-1	69	# 1
Visual Storytelling	VIST	AOG + ARS	BLEU-2	44	# 1
Visual Storytelling	VIST	AOG + ARS	BLEU-3	23.9	# 4
Visual Storytelling	VIST	AOG + ARS	BLEU-4	12.9	# 20
Visual Storytelling	VIST	AOG + ARS	METEOR	36.0	# 6
Visual Storytelling	VIST	AOG + ARS	CIDEr	12.0	# 4
Visual Storytelling	VIST	AOG + ARS	ROUGE-L	30.1	# 11

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/aog-lstm-an-adaptive-attention-neural-network/visual-storytelling-on-vist)](https://paperswithcode.com/sota/visual-storytelling-on-vist?p=aog-lstm-an-adaptive-attention-neural-network)`

AOG-LSTM: An adaptive attention neural network for visual storytelling

Neurocomputing 2023 · Hanqing Liu, Jiacheng Yang, Chia-Hao Chang, Wei Wang, Hai-Tao Zheng, Yong Jiang, Hui Wang, Rui Xie, and Wei Wu ·

Visual storytelling is the task of generating a related story for a given image sequence, which has received significant attention. However, using general RNNs (such as LSTM and GRU) as the decoder limit the performance of the models in this task. This is because they can not differentiate different types of information representations. In addition, optimizing the probabilities of subsequent words conditioned on the previous ground-truth sequences can cause error accumulation during inference. Moreover, the existing method of alleviating error accumulation based on replacing reference words does not take into account the different effects of each word. To address the above problems, we propose a modified neural network named AOG-LSTM and a modified training strategy named ARS, respectively. AOG-LSTM can adaptatively pay appropriate attention to different information representations within it when predicting different words. During training, ARS replaces some words in the reference sentences with model predictions similar to the existing method. However, we utilize the selection network and selection strategy to select more appropriate words for the replacement to better improve the model. Experiments on the VIST Dataset demonstrate that our model outperforms several strong baselines on the most commonly used metrics.

PDF

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Visual Storytelling

Datasets

VIST

Results from the Paper

Add Remove

Ranked #20 on Visual Storytelling on VIST

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Visual Storytelling	VIST	AOG + ARS	BLEU-1	69	# 1	Compare
			BLEU-2	44	# 1	Compare
			BLEU-3	23.9	# 4	Compare
			BLEU-4	12.9	# 20	Compare
			METEOR	36.0	# 6	Compare
			CIDEr	12.0	# 4	Compare
			ROUGE-L	30.1	# 11	Compare

Methods

Add Remove

LSTM • Sigmoid Activation • Tanh Activation

Edit Social Preview

AOG-LSTM: An adaptive attention neural network for visual storytelling

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove