TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Paragraph Captioning	Image Paragraph Captioning	IMG+LNG	BLEU-4	4.67	# 10
Image Paragraph Captioning	Image Paragraph Captioning	IMG+LNG	METEOR	11.30	# 10
Image Paragraph Captioning	Image Paragraph Captioning	IMG+LNG	CIDEr	26.38	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/when-an-image-tells-a-story-the-role-of/image-paragraph-captioning-on-image-paragraph)](https://paperswithcode.com/sota/image-paragraph-captioning-on-image-paragraph?p=when-an-image-tells-a-story-the-role-of)`

When an Image Tells a Story: The Role of Visual and Semantic Information for Generating Paragraph Descriptions

INLG (ACL) 2020 · Nikolai Ilinykh, Simon Dobnik ·

Generating multi-sentence image descriptions is a challenging task, which requires a good model to produce coherent and accurate paragraphs, describing salient objects in the image. We argue that multiple sources of information are beneficial when describing visual scenes with long sequences. These include (i) perceptual information and (ii) semantic (language) information about how to describe what is in the image. We also compare the effects of using two different pooling mechanisms on either a single modality or their combination. We demonstrate that the model which utilises both visual and language inputs can be used to generate accurate and diverse paragraphs when combined with a particular pooling mechanism. The results of our automatic and human evaluation show that learning to embed semantic information along with visual stimuli into the paragraph generation model is not trivial, raising a variety of proposals for future experiments.

PDF Abstract