no code implementations • ICCV 2023 • Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He
In this paper, we show that recent advances in video representation learning and pre-trained vision-language models allow for substantial improvements in self-supervised video object localization.
1 code implementation • ICCV 2023 • Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun Xiao
Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines.
1 code implementation • 11 Jul 2023 • Pengfei Li, Gang Liu, Jinlong He, Zixu Zhao, Shenjun Zhong
Medical visual question answering (VQA) is a challenging task that requires answering clinical questions of a given medical image, by taking consider of both visual and language information.
Ranked #1 on Medical Visual Question Answering on PathVQA
no code implementations • 12 Mar 2023 • Yi Wang, Jiaze Wang, Jinpeng Li, Zixu Zhao, Guangyong Chen, Anfeng Liu, Pheng-Ann Heng
With Point-MAE as our baseline, our model surpasses previous methods by a significant margin, achieving 86. 3% accuracy on ScanObjectNN and 94. 1% accuracy on ModelNet40.
no code implementations • 20 Jul 2022 • Yang Yu, Zixu Zhao, Yueming Jin, Guangyong Chen, Qi Dou, Pheng-Ann Heng
Concretely, for trusty representation learning, we propose to incorporate pseudo labels to instruct the pair selection, obtaining more reliable representation pairs for pixel contrast.
1 code implementation • 29 Mar 2022 • Yueming Jin, Yang Yu, Cheng Chen, Zixu Zhao, Pheng-Ann Heng, Danail Stoyanov
Automatic surgical scene segmentation is fundamental for facilitating cognitive intelligence in the modern operating theatre.
no code implementations • 17 Feb 2022 • Zixu Zhao, Yueming Jin, Pheng-Ann Heng
Specifically, we introduce the prior query that encoded with previous temporal knowledge, to transfer tracking signals to current instances via identity matching.
no code implementations • ICCV 2021 • Zixu Zhao, Yueming Jin, Pheng-Ann Heng
This paper presents a self-supervised method for learning reliable visual correspondence from unlabeled videos.
1 code implementation • 30 Mar 2021 • Yueming Jin, Yonghao Long, Cheng Chen, Zixu Zhao, Qi Dou, Pheng-Ann Heng
In this paper, we propose a novel end-to-end temporal memory relation network (TMRNet) for relating long-range and multi-scale temporal patterns to augment the present features.
no code implementations • 24 Mar 2021 • Zixu Zhao, Yueming Jin, Bo Lu, Chi-Fai Ng, Qi Dou, Yun-hui Liu, Pheng-Ann Heng
To greatly increase the label efficiency, we explore a new problem, i. e., adaptive instrument segmentation, which is to effectively adapt one source model to new robotic surgical videos from multiple target domains, only given the annotated instruments in the first frame.
no code implementations • 18 Mar 2021 • Xiaojie Gao, Yueming Jin, Zixu Zhao, Qi Dou, Pheng-Ann Heng
Predicting future frames for robotic surgical video is an interesting, important yet extremely challenging problem, given that the operative tasks may have complex dynamics.
1 code implementation • 6 Jul 2020 • Zixu Zhao, Yueming Jin, Xiaojie Gao, Qi Dou, Pheng-Ann Heng
Considering the fast instrument motion, we further introduce a flow compensator to estimate intermediate motion within continuous frames, with a novel cycle learning strategy.
no code implementations • 3 May 2019 • Zixu Zhao, Huangjing Lin, Hao Chen, Pheng-Ann Heng
Automatic detection of cancer metastasis from whole slide images (WSIs) is a crucial step for following patient staging and prognosis.
no code implementations • 24 Apr 2018 • Zixu Zhao
By comparing the segment results and their corresponding FI values, this novel method produces a machine-vision-based index that has the best-fit relation with FI.
no code implementations • 23 Apr 2018 • Fouad Amer, Zixu Zhao, Siwei Tang, Wilfredo Torres
By matching the ORB feature of the tags with their corresponding features in the scene, it is then possible to localize the position of these tags both in point clouds constructed by ORB-SLAM2 and OpenSfM.
no code implementations • 22 Jan 2018 • Qianye Yang, Nannan Li, Zixu Zhao, Xingyu Fan, Eric I-Chao Chang, Yan Xu
Based on our proposed framework, we first propose a method for cross-modality registration by fusing the deformation fields to adopt the cross-modality information from translated modalities.