MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning

ACL 2020 Jie LeiLiwei WangYelong ShenDong YuTamara L. BergMohit Bansal

Generating multi-sentence descriptions for videos is one of the most challenging captioning tasks due to its high requirements for not only visual relevance but also discourse-based coherence across the sentences in the paragraph. Towards this goal, we propose a new approach called Memory-Augmented Recurrent Transformer (MART), which uses a memory module to augment the transformer architecture... (read more)

PDF Abstract ACL 2020 PDF ACL 2020 Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper