Image Caption Generation Framework for Assamese News using Attention Mechanism

ICON 2021 · Ringki Das, Thoudam Doren Singh ·

Automatic caption generation is an artificial intelligence problem that falls at the intersection of computer vision and natural language processing. Although significant works have been reported in image captioning, the contribution is limited to English and few major languages with sufficient resources. But, no work on image captioning has been reported in a resource-constrained language like Assamese. With this inspiration, we propose an encoder-decoder based framework for image caption generation in the Assamese news domain. The VGG-16 pre-trained model at the encoder side and LSTM with an attention mechanism are employed at the decoder side to generate the Assamese caption. We train the proposed model on the dataset built in-house consisting of 10,000 images with a single caption for each image. We describe our experimental methodology, quantitative and qualitative results which validate the effectiveness of our model for caption generation. The proposed model shows a BLEU score of 12.1 outperforming the baseline model.

PDF Abstract