1 code implementation • 3 Feb 2024 • Zihan Li, Yuan Zheng, Dandan Shan, Shuzhou Yang, Qingde Li, Beizhan Wang, YuanTing Zhang, Qingqi Hong, Dinggang Shen
The proposed ScribFormer model has a triple-branch structure, i. e., the hybrid of a CNN branch, a Transformer branch, and an attention-guided class activation map (ACAM) branch.
Ranked #1 on Semantic Segmentation on ACDC Scribbles
no code implementations • 8 Dec 2023 • Georgios Pavlakos, Dandan Shan, Ilija Radosavovic, Angjoo Kanazawa, David Fouhey, Jitendra Malik
The key to HaMeR's success lies in scaling up both the data used for training and the capacity of the deep network for hand reconstruction.
1 code implementation • 30 Jul 2023 • Zihan Li, Yuan Zheng, Xiangde Luo, Dandan Shan, Qingqi Hong
We evaluate ScribbleVC on three benchmark datasets and compare it with state-of-the-art methods.
Ranked #2 on Semantic Segmentation on ACDC Scribbles
1 code implementation • 1 Mar 2023 • Dandan Shan, Zihan Li, Wentao Chen, Qingde Li, Jie Tian, Qingqi Hong
Segmentation of COVID-19 lesions can assist physicians in better diagnosis and treatment of COVID-19.
Ranked #1 on Medical Image Segmentation on MosMedData
3 code implementations • 26 Sep 2022 • Ahmad Darkhalil, Dandan Shan, Bin Zhu, Jian Ma, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, Dima Damen
VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets.
1 code implementation • 16 Feb 2022 • Oana Ignat, Santiago Castro, YuHang Zhou, Jiajun Bao, Dandan Shan, Rada Mihalcea
We consider the task of temporal human action localization in lifestyle vlogs.
no code implementations • NeurIPS 2021 • Dandan Shan, Richard Higgins, David Fouhey
In this paper we learn to segment hands and hand-held objects from motion.
1 code implementation • CVPR 2020 • Dandan Shan, Jiaqi Geng, Michelle Shu, David F. Fouhey
Hands are the central means by which humans manipulate their world and being able to reliably extract hand state information from Internet videos of humans engaged in their hands has the potential to pave the way to systems that can learn from petabytes of video data.