1 code implementation • 12 Mar 2024 • Mingze Wang, Lili Su, Cilin Yan, Sheng Xu, Pengcheng Yuan, XiaoLong Jiang, Baochang Zhang
RSBuilding is designed to enhance cross-scene generalization and task universality.
1 code implementation • 17 May 2023 • Bohan Zeng, Shanglin Li, Xuhui Liu, Sicheng Gao, XiaoLong Jiang, Xu Tang, Yao Hu, Jianzhuang Liu, Baochang Zhang
Brain signal visualization has emerged as an active research area, serving as a critical interface between the human visual system and computer vision models.
1 code implementation • 23 Apr 2023 • Cilin Yan, Haochen Wang, Jie Liu, XiaoLong Jiang, Yao Hu, Xu Tang, Guoliang Kang, Efstratios Gavves
Click-based interactive segmentation aims to generate target masks via human clicking, which facilitates efficient pixel-level annotation and image editing.
no code implementations • 14 Apr 2023 • Jie Guo, Qimeng Wang, Yan Gao, XiaoLong Jiang, Xu Tang, Yao Hu, Baochang Zhang
CLIP (Contrastive Language-Image Pretraining) is well-developed for open-vocabulary zero-shot image-level recognition, while its applications in pixel-level tasks are less investigated, where most efforts directly adopt CLIP features without deliberative adaptations.
1 code implementation • ICCV 2023 • Haochen Wang, Cilin Yan, Shuai Wang, XiaoLong Jiang, Xu Tang, Yao Hu, Weidi Xie, Efstratios Gavves
Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos.
1 code implementation • 16 Feb 2023 • Keyan Chen, Wenyuan Li, Sen Lei, Jianqi Chen, XiaoLong Jiang, Zhengxia Zou, Zhenwei Shi
Despite its fruitful applications in remote sensing, image super-resolution is troublesome to train and deploy as it handles different resolution magnifications with separate models.
1 code implementation • CVPR 2023 • Keyan Chen, XiaoLong Jiang, Yao Hu, Xu Tang, Yan Gao, Jianqi Chen, Weidi Xie
In this paper, we consider the problem of simultaneously detecting objects and inferring their visual attributes in an image, even for those with no manual annotations provided at the training stage, resembling an open-vocabulary scenario.
Ranked #1 on Open Vocabulary Attribute Detection on OVAD benchmark (using extra training data)
1 code implementation • CVPR 2021 • Haochen Wang, XiaoLong Jiang, Haibing Ren, Yao Hu, Song Bai
In this work we present SwiftNet for real-time semisupervised video object segmentation (one-shot VOS), which reports 77. 8% J &F and 70 FPS on DAVIS 2017 validation dataset, leading all present solutions in overall accuracy and speed performance.
1 code implementation • 11 Jan 2021 • Tun Zhu, Daoxin Zhang, Yao Hu, Tianran Wang, XiaoLong Jiang, Jianke Zhu, Jiawei Li
Alongside the prevalence of mobile videos, the general public leans towards consuming vertical videos on hand-held devices.
1 code implementation • 11 Jul 2019 • Xiaolong Jiang, Peizhao Li, Yanjing Li, Xian-Tong Zhen
In this work, we present an end-to-end framework to settle data association in online Multiple-Object Tracking (MOT).
no code implementations • CVPR 2019 • Xiaolong Jiang, Zehao Xiao, Baochang Zhang, Xiantong Zhen, Xianbin Cao, David Doermann, Ling Shao
In this paper, we propose a trellis encoder-decoder network (TEDnet) for crowd counting, which focuses on generating high-quality density estimation maps.
no code implementations • 3 Mar 2019 • Xiaolong Jiang, Zehao Xiao, Baochang Zhang, Xian-Tong Zhen, Xian-Bin Cao, David Doermann, Ling Shao
In this paper, we propose a trellis encoder-decoder network (TEDnet) for crowd counting, which focuses on generating high-quality density estimation maps.
no code implementations • 16 Dec 2018 • Xiaolong Jiang, Peizhao Li, Xian-Tong Zhen, Xian-Bin Cao
To overcome the object-centric information scarcity, both appearance and motion features are deeply integrated by the proposed AMNet, which is an end-to-end offline trained two-stream network.