no code implementations • 3 Aug 2022 • Xingchen Li, Long Chen, Jian Shao, Shaoning Xiao, Songyang Zhang, Jun Xiao
Current Scene Graph Generation (SGG) methods tend to predict frequent predicate categories and fail to recognize rare ones due to the severe imbalanced distribution of predicates.
no code implementations • 25 Apr 2022 • Shaoning Xiao, Long Chen, Kaifeng Gao, Zhao Wang, Yi Yang, Zhimeng Zhang, Jun Xiao
From the view of feature, we break down the video into trajectories and first leverage trajectory feature in VideoQA to enhance the alignment between two modalities.
1 code implementation • EMNLP 2021 • Shaoning Xiao, Long Chen, Jian Shao, Yueting Zhuang, Jun Xiao
Given an untrimmed video and a natural language query, Natural Language Video Localization (NLVL) aims to identify the video moment described by the query.
no code implementations • 26 May 2021 • Feifei Shao, Long Chen, Jian Shao, Wei Ji, Shaoning Xiao, Lu Ye, Yueting Zhuang, Jun Xiao
With the success of deep neural networks in object detection, both WSOD and WSOL have received unprecedented attention.
no code implementations • 15 Mar 2021 • Shaoning Xiao, Long Chen, Songyang Zhang, Wei Ji, Jian Shao, Lu Ye, Jun Xiao
State-of-the-art NLVL methods are almost in one-stage fashion, which can be typically grouped into two categories: 1) anchor-based approach: it first pre-defines a series of video segment candidates (e. g., by sliding window), and then does classification for each candidate; 2) anchor-free approach: it directly predicts the probabilities for each video frame as a boundary or intermediate frame inside the positive segment.