Order-Prompted Tag Sequence Generation for Video Tagging

Video Tagging intends to infer multiple tags spanning relevant content for a given video. Typically, video tags are freely defined and uploaded by a variety of users, so they have two characteristics: abundant in quantity and disordered intra-video. It is difficult for the existing multi-label classification and generation methods to adapt directly to this task. This paper proposes a novel generative model, Order-Prompted Tag Sequence Generation (OP-TSG), according to the above characteristics. It regards video tagging as a tag sequence generation problem guided by sample-dependent order prompts. These prompts are semantically aligned with tags and enable to decouple tag generation order, making the model focus on modeling the tag dependencies. Moreover, the word-based generation strategy enables the model to generate novel tags. To verify the effectiveness and generalization of the proposed method, a Chinese video tagging benchmark CREATE-tagging, and an English image tagging benchmark Pexel-tagging are established. Extensive results show that OP-TSG is significantly superior to other methods, especially the results on rare tags improve by 3.3% and 3% over SOTA methods on CREATE-tagging and Pexel-tagging, and novel tags generated on CREATE-tagging exhibit a tag gain of 7.04%.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here