Moment Queries

Egocentric Video-Language Pretraining

showlab/egovlp 3 Jun 2022

Video-Language Pretraining (VLP), which aims to learn transferable representation to advance a wide range of video-text downstream tasks, has recently received increasing attention.

Where a Strong Backbone Meets Strong Features -- ActionFormer for Ego4D Moment Queries Challenge

happyharrycn/actionformer_release 16 Nov 2022

This report describes our submission to the Ego4D Moment Queries Challenge 2022.

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

opengvlab/ego4d-eccv2022-solutions 17 Nov 2022

In this report, we present our champion solutions to five tracks at Ego4D challenge.

ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022

JonnyS1226/Ego4d_mq_3rd_solution 17 Nov 2022

Moreover, in order to better capture the long-term temporal dependencies in the long videos, we propose a segment-level recurrence mechanism.

Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023

jonnys1226/ego4d_asl 15 Jun 2023

This report presents ReLER submission to two tracks in the Ego4D Episodic Memory Benchmark in CVPR 2023, including Natural Language Queries and Moment Queries.

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

facebookresearch/EgoVLPv2 ICCV 2023

Video-language pre-training (VLP) has become increasingly important due to its ability to generalize to various vision and language tasks.

Knowing Where to Focus: Event-aware Transformer for Video Grounding

jinhyunj/eatr ICCV 2023

Recent DETR-based video grounding models have made the model directly predict moment timestamps without any hand-crafted components, such as a pre-defined proposal or non-maximum suppression, by learning moment queries.

UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection

yingsen1/unimd 7 Apr 2024

Temporal Action Detection (TAD) focuses on detecting pre-defined actions, while Moment Retrieval (MR) aims to identify the events described by open-ended natural language within untrimmed videos.