1 code implementation • 2 Jan 2024 • Xinpeng Ding, Jinahua Han, Hang Xu, Xiaodan Liang, Wei zhang, Xiaomeng Li
BEV-InMLLM integrates multi-view, spatial awareness, and temporal semantics to enhance MLLMs' capabilities on NuInstruct tasks.
no code implementations • 5 Dec 2023 • Guozhang Li, Xinpeng Ding, De Cheng, Jie Li, Nannan Wang, Xinbo Gao
To further clarify the noise of expanded boundaries, we combine mutual learning with a tailored proposal-level contrastive objective to use a learnable approach to harmonize a balance between incomplete yet clean (initial) and comprehensive yet noisy (expanded) boundaries for more precise ones.
1 code implementation • ICCV 2023 • Jiewen Yang, Xinpeng Ding, Ziyang Zheng, Xiaowei Xu, Xiaomeng Li
This paper studies the unsupervised domain adaption (UDA) for echocardiogram video segmentation, where the goal is to generalize the model trained on the source domain to other unlabelled target domains.
1 code implementation • 20 Sep 2023 • Ziyang Zheng, Jiewen Yang, Xinpeng Ding, Xiaowei Xu, Xiaomeng Li
Additionally, a Multi-view Local-based Fusion Module (MLFM) is designed to extract correlations of cardiac structures from different views.
no code implementations • 11 Sep 2023 • Xinpeng Ding, Jianhua Han, Hang Xu, Wei zhang, Xiaomeng Li
For the first time, we leverage singular multimodal large language models (MLLMs) to consolidate multiple autonomous driving tasks from videos, i. e., the Risk Object Localization and Intention and Suggestion Prediction (ROLISP) task.
no code implementations • 24 Aug 2023 • Siming Fu, Xiaoxuan He, Xinpeng Ding, Yuchen Cao, Hualiang Wang
Category prototype-guided mechanism for image-text matching makes the features of different classes converge to these distinct and uniformly distributed category prototypes, which maintain a uniform distribution in the feature space, and improve class boundaries.
1 code implementation • 15 Aug 2023 • Zheang Huai, Xinpeng Ding, Yi Li, Xiaomeng Li
To this end, we propose a context-aware pseudo-label refinement method for SF-UDA.
1 code implementation • CVPR 2023 • Guozhang Li, De Cheng, Xinpeng Ding, Nannan Wang, Xiaoyu Wang, Xinbo Gao
For the discriminative objective, we propose a Text-Segment Mining (TSM) mechanism, which constructs a text description based on the action class label, and regards the text as the query to mine all class-related segments.
1 code implementation • 25 Apr 2023 • Guozhang Li, De Cheng, Xinpeng Ding, Nannan Wang, Jie Li, Xinbo Gao
The proposed Bi-SCC firstly adopts a temporal context augmentation to generate an augmented video that breaks the correlation between positive actions and their co-scene actions in the inter-video; Then, a semantic consistency constraint (SCC) is used to enforce the predictions of the original video and augmented video to be consistent, hence suppressing the co-scene actions.
Weakly-supervised Temporal Action Localization Weakly Supervised Temporal Action Localization
1 code implementation • 20 Oct 2022 • Weihang Dai, Xiaomeng Li, Xinpeng Ding, Kwang-Ting Cheng
We also introduce teacher-student distillation to distill the information from LV segmentation masks into an end-to-end LVEF regression model that only requires video inputs.
no code implementations • 30 Jul 2022 • Xinpeng Ding, Jingweng Yang, Xiaowei Hu, Xiaomeng Li
We further design a new evaluation metric to evaluate the temporal stability of the video shadow detection results.
1 code implementation • 19 May 2022 • Xinpeng Ding, Ziwei Liu, Xiaomeng Li
Our key insight is to distill knowledge from publicly available models trained on large generic datasets4 to facilitate the self-supervised learning of surgical videos.
1 code implementation • 16 Feb 2022 • Xinpeng Ding, Xinjian Yan, Zixun Wang, Wei Zhao, Jian Zhuang, Xiaowei Xu, Xiaomeng Li
Our study uncovers unique insights of surgical phase recognition with timestamp supervisions: 1) timestamp annotation can reduce 74% annotation time compared with the full annotation, and surgeons tend to annotate those timestamps near the middle of phases; 2) extensive experiments demonstrate that our method can achieve competitive results compared with full supervision methods, while reducing manual annotation cost; 3) less is more in surgical phase recognition, i. e., less but discriminative pseudo labels outperform full but containing ambiguous frames; 4) the proposed UATD can be used as a plug and play method to clean ambiguous labels near boundaries between phases, and improve the performance of the current surgical phase recognition methods.
1 code implementation • 22 Nov 2021 • Xinpeng Ding, Xiaomeng Li
Automatic surgical phase recognition plays a vital role in robot-assisted surgeries.
no code implementations • ICCV 2021 • Xinpeng Ding, Nannan Wang, Shiwei Zhang, De Cheng, Xiaomeng Li, Ziyuan Huang, Mingqian Tang, Xinbo Gao
The contrastive objective aims to learn effective representations by contrastive learning, while the caption objective can train a powerful video encoder supervised by texts.
no code implementations • 3 Jul 2020 • Xinpeng Ding, Nannan Wang, Xinbo Gao, Jie Li, Xiaoyu Wang, Tongliang Liu
Specifically, we devise a partial segment loss regarded as a loss sampling to learn integral action parts from labeled segments.
Weakly-supervised Temporal Action Localization Weakly Supervised Temporal Action Localization