Search Results for author: Xiujun Shu

Found 15 papers, 9 papers with code

Unified and Dynamic Graph for Temporal Character Grouping in Long Videos

no code implementations • 27 Aug 2023 • Xiujun Shu, Wei Wen, Liangsheng Xu, Mingbao Lin, Ruizhi Qiao, Taian Guo, Hanjun Li, Bei Gan, Xiao Wang, Xing Sun

In this paper, we present a unified and dynamic graph (UniDG) framework for temporal character grouping.

Clustering Graph Clustering

Paper
Add Code

D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation

1 code implementation • ICCV 2023 • Hanjun Li, Xiujun Shu, Sunan He, Ruizhi Qiao, Wei Wen, Taian Guo, Bei Gan, Xing Sun

Under this setup, we propose a Dynamic Gaussian prior based Grounding framework with Glance annotation (D3G), which consists of a Semantic Alignment Group Contrastive Learning module (SA-GCL) and a Dynamic Gaussian prior Adjustment module (DGA).

Ranked #10 on Temporal Sentence Grounding on Charades-STA

Contrastive Learning Sentence +1

Paper
Code

Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies

1 code implementation • CVPR 2023 • Bei Gan, Xiujun Shu, Ruizhi Qiao, Haoqian Wu, Keyu Chen, Hanjun Li, Bo Ren

Based on existing efforts, this work has two observations: (1) For different annotators, labeling highlight has uncertainty, which leads to inaccurate and time-consuming annotations.

Highlight Detection Learning with noisy labels +1

Paper
Code

NewsNet: A Novel Dataset for Hierarchical Temporal Segmentation

no code implementations • CVPR 2023 • Haoqian Wu, Keyu Chen, Haozhe Liu, Mingchen Zhuge, Bing Li, Ruizhi Qiao, Xiujun Shu, Bei Gan, Liangsheng Xu, Bo Ren, Mengmeng Xu, Wentian Zhang, Raghavendra Ramachandra, Chia-Wen Lin, Bernard Ghanem

Temporal video segmentation is the get-to-go automatic video analysis, which decomposes a long-form video into smaller components for the following-up understanding tasks.

Video Segmentation Video Semantic Segmentation

Paper
Add Code

VLMAE: Vision-Language Masked Autoencoder

no code implementations • 19 Aug 2022 • Sunan He, Taian Guo, Tao Dai, Ruizhi Qiao, Chen Wu, Xiujun Shu, Bo Ren

Image and language modeling is of crucial importance for vision-language pre-training (VLP), which aims to learn multi-modal representations from large-scale paired image-text data.

Language Modelling Question Answering +4

Paper
Add Code

See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval

1 code implementation • 18 Aug 2022 • Xiujun Shu, Wei Wen, Haoqian Wu, Keyu Chen, Yiran Song, Ruizhi Qiao, Bo Ren, Xiao Wang

To explore the fine-grained alignment, we further propose two implicit semantic alignment paradigms: multi-level alignment (MLA) and bidirectional mask modeling (BMM).

Person Retrieval Retrieval +3