Search Results for author: Yin-Dong Zheng

Found 8 papers, 4 papers with code

VideoLLM: Modeling Video Sequence with Large Language Models

1 code implementation • 22 May 2023 • Guo Chen, Yin-Dong Zheng, Jiahao Wang, Jilan Xu, Yifei HUANG, Junting Pan, Yi Wang, Yali Wang, Yu Qiao, Tong Lu, LiMin Wang

Building upon this insight, we propose a novel framework called VideoLLM that leverages the sequence reasoning capabilities of pre-trained LLMs from natural language processing (NLP) for video sequence understanding.

Video Understanding

154

Paper
Code

MRSN: Multi-Relation Support Network for Video Action Detection

no code implementations • 24 Apr 2023 • Yin-Dong Zheng, Guo Chen, Minglei Yuan, Tong Lu

Action detection is a challenging video understanding task, requiring modeling spatio-temporal and interaction relations.

Action Detection Relation +1

Paper
Add Code

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges

2 code implementations • 17 Nov 2022 • Guo Chen, Sen Xing, Zhe Chen, Yi Wang, Kunchang Li, Yizhuo Li, Yi Liu, Jiahao Wang, Yin-Dong Zheng, Bingkun Huang, Zhiyu Zhao, Junting Pan, Yifei HUANG, Zun Wang, Jiashuo Yu, Yinan He, Hongjie Zhang, Tong Lu, Yali Wang, LiMin Wang, Yu Qiao

In this report, we present our champion solutions to five tracks at Ego4D challenge.

Ranked #1 on State Change Object Detection on Ego4D

Future Hand Prediction Moment Queries +7

Paper
Code

Exploring State Change Capture of Heterogeneous Backbones @ Ego4D Hands and Objects Challenge 2022

no code implementations • 16 Nov 2022 • Yin-Dong Zheng, Guo Chen, Jiahao Wang, Tong Lu, LiMin Wang

Our method achieves an accuracy of 0. 796 on OSCC while achieving an absolute temporal localization error of 0. 516 on PNR.

Human-Object Interaction Detection Object +3

Paper
Add Code

Uncertainty-based Network for Few-shot Image Classification

no code implementations • 17 May 2022 • Minglei Yuan, Qian Xu, Chunhao Cai, Yin-Dong Zheng, Tao Wang, Tong Lu

Specifically, we first data augment and classify the query instance and calculate the mutual information of these classification scores.

Classification Few-Shot Image Classification +1

Paper
Add Code

BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection

2 code implementations • 5 May 2022 • Min Yang, Guo Chen, Yin-Dong Zheng, Tong Lu, LiMin Wang

Empirical results demonstrate that our PlusTAD is very efficient and significantly outperforms the previous methods on the datasets of THUMOS14 and FineAction.

Ranked #1 on Temporal Action Localization on THUMOS14

Action Detection object-detection +3

Paper
Code

DCAN: Improving Temporal Action Detection via Dual Context Aggregation

1 code implementation • 7 Dec 2021 • Guo Chen, Yin-Dong Zheng, LiMin Wang, Tong Lu

Specifically, we design the Multi-Path Temporal Context Aggregation (MTCA) to achieve smooth context aggregation on boundary level and precise evaluation of boundaries.

Ranked #19 on Temporal Action Localization on ActivityNet-1.3

Action Detection Temporal Action Localization

Paper
Code

Dynamic Sampling Networks for Efficient Action Recognition in Videos

no code implementations • 28 Jun 2020 • Yin-Dong Zheng, Zhao-Yang Liu, Tong Lu, Li-Min Wang

The existing action recognition methods are mainly based on clip-level classifiers such as two-stream CNNs or 3D CNNs, which are trained from the randomly selected clips and applied to densely sampled clips during testing.

Ranked #9 on Action Recognition on ActivityNet

Action Recognition In Videos

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.