Search Results for author: Chaolei Tan

Found 6 papers, 0 papers with code

Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels

no code implementations • 21 Mar 2024 • Tianming Liang, Chaolei Tan, Beihao Xia, Wei-Shi Zheng, Jian-Fang Hu

This paper focuses on open-ended video question answering, which aims to find the correct answers from a large answer set in response to a video-related question.

Multi-Label Classification Question Answering +1

Paper
Add Code

Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding

no code implementations • 18 Mar 2024 • Chaolei Tan, JianHuang Lai, Wei-Shi Zheng, Jian-Fang Hu

Different from previous weakly-supervised grounding frameworks based on multiple instance learning or reconstruction learning for two-stage candidate ranking, we propose a novel siamese learning framework that jointly learns the cross-modal feature alignment and temporal coordinate regression without timestamp labels to achieve concise one-stage localization for WSVPG.

Multiple Instance Learning

Paper
Add Code

Collaborative Static and Dynamic Vision-Language Streams for Spatio-Temporal Video Grounding

no code implementations • CVPR 2023 • Zihang Lin, Chaolei Tan, Jian-Fang Hu, Zhi Jin, Tiancai Ye, Wei-Shi Zheng

The static stream performs cross-modal understanding in a single frame and learns to attend to the target object spatially according to intra-frame visual cues like object appearances.

Object Spatio-Temporal Video Grounding +1

Paper
Add Code

Hierarchical Semantic Correspondence Networks for Video Paragraph Grounding

no code implementations • CVPR 2023 • Chaolei Tan, Zihang Lin, Jian-Fang Hu, Wei-Shi Zheng, JianHuang Lai

Specifically, we develop a hierarchical encoder that encodes the multi-modal inputs into semantics-aligned representations at different levels.

Sentence Video Grounding

Paper
Add Code

STVGFormer: Spatio-Temporal Video Grounding with Static-Dynamic Cross-Modal Understanding

no code implementations • 6 Jul 2022 • Zihang Lin, Chaolei Tan, Jian-Fang Hu, Zhi Jin, Tiancai Ye, Wei-Shi Zheng

The static branch performs cross-modal understanding in a single frame and learns to localize the target object spatially according to intra-frame visual cues like object appearances.

Ranked #2 on Spatio-Temporal Video Grounding on HC-STVG2

Spatio-Temporal Video Grounding Video Grounding

Paper
Add Code

Augmented 2D-TAN: A Two-stage Approach for Human-centric Spatio-Temporal Video Grounding

no code implementations • 20 Jun 2021 • Chaolei Tan, Zihang Lin, Jian-Fang Hu, Xiang Li, Wei-Shi Zheng

We propose an effective two-stage approach to tackle the problem of language-based Human-centric Spatio-Temporal Video Grounding (HC-STVG) task.

Spatio-Temporal Video Grounding Video Grounding

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.