Temporal Sentence Grounding

11 papers with code • 1 benchmarks • 1 datasets

Temporal sentence grounding (TSG) aims to locate a specific moment from an untrimmed video with a given natural language query. For this task, different levels of supervision are used. 1) Weak supervision: video-level action category set; 2) Semi-weak supervision: video-level action category set, and action annotations at several timestamps; 3) Full supervision: Action category and action interval annotations of all actions in untrimmed videos.

Most implemented papers

Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding

mcg-nju/mmn 10 Sep 2021

Instead, from a perspective on temporal grounding as a metric-learning problem, we present a Mutual Matching Network (MMN), to directly model the similarity between language queries and video moments in a joint embedding space.

Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos

yytzsy/SCDM NeurIPS 2019

Temporal sentence grounding in videos aims to detect and localize one target video segment, which semantically corresponds to a given sentence.

Uncovering Hidden Challenges in Query-Based Video Moment Retrieval

mayu-ot/hidden-challenges-MR 1 Sep 2020

In this paper, we present a series of experiments assessing how well the benchmark results reflect the true progress in solving the moment retrieval task.

Context-aware Biaffine Localizing Network for Temporal Sentence Grounding

liudaizong/CBLN CVPR 2021

This paper addresses the problem of temporal sentence grounding (TSG), which aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query.

Weakly Supervised Temporal Sentence Grounding With Gaussian-Based Contrastive Proposal Learning

minghangz/cpl CVPR 2022

Moreover, they train their model to distinguish positive visual-language pairs from negative ones randomly collected from other videos, ignoring the highly confusing video segments within the same video.

D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation

solicucu/d3g ICCV 2023

Under this setup, we propose a Dynamic Gaussian prior based Grounding framework with Glance annotation (D3G), which consists of a Semantic Alignment Group Contrastive Learning module (SA-GCL) and a Dynamic Gaussian prior Adjustment module (DGA).

Temporal Sentence Grounding in Streaming Videos

sczwangxiao/tsgvs-mm2023 14 Aug 2023

The goal of TSGSV is to evaluate the relevance between a video stream and a given sentence query.

Learning Temporal Sentence Grounding From Narrated EgoVideos

keflanagan/climer 26 Oct 2023

Compared to traditional benchmarks on which this task is evaluated, these datasets offer finer-grained sentences to ground in notably longer videos.

BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos

Pilhyeon/BAM-DETR 30 Nov 2023

However, they suffer from the issue of center misalignment raised by the inherent ambiguity of moment centers, leading to inaccurate predictions.

Gaussian Mixture Proposals with Pull-Push Learning Scheme to Capture Diverse Events for Weakly Supervised Temporal Video Grounding

sunoh-kim/pps 27 Dec 2023

In the weakly supervised temporal video grounding study, previous methods use predetermined single Gaussian proposals which lack the ability to express diverse events described by the sentence query.