1 code implementation • 18 Mar 2024 • Xiaoqiong Liu, Yunhe Feng, Shu Hu, Xiaohui Yuan, Heng Fan
Addressing this, we propose UAV-C, a large-scale benchmark for assessing robustness of UAV trackers under common corruptions.
1 code implementation • 15 Mar 2024 • Kareem Shaik, Dali Wang, Weijian Zheng, Qinglei Cao, Heng Fan, Peter Schwartz, Yunhe Feng
S3LLM demonstrates the potential of using locally deployed open-source LLMs for the rapid understanding of large-scale scientific computing software, eliminating the need for extensive coding expertise, and thereby making the process more efficient and effective.
no code implementations • 8 Mar 2024 • Yunhao Li, Hao Wang, Xue Ma, Jiali Yao, Shaohua Dong, Heng Fan, Libo Zhang
Current multi-object tracking (MOT) aims to predict trajectories of targets (i. e.,"where") in videos.
no code implementations • 8 Mar 2024 • Liting Lin, Heng Fan, Zhipeng Zhang, YaoWei Wang, Yong Xu, Haibin Ling
The shared embeddings, which describe the absolute coordinates of multi-resolution images (namely, the template and search images), are inherited from the pre-trained backbones.
1 code implementation • 6 Mar 2024 • Liang Peng, Junyuan Gao, Xinran Liu, Weihong Li, Shaohua Dong, Zhipeng Zhang, Heng Fan, Libo Zhang
The rich annotations of VastTrack enables development of both the vision-only and the vision-language tracking.
no code implementations • 26 Feb 2024 • Xin Wang, Shu Hu, Heng Fan, Hongtu Zhu, Xin Li
Neural Radiance Fields (NeRF), as a pioneering technique in computer vision, offer great potential to revolutionize medical imaging by synthesizing three-dimensional representations from the projected two-dimensional image data.
1 code implementation • 3 Jan 2024 • Xin Gu, Heng Fan, Yan Huang, Tiejian Luo, Libo Zhang
The key of CG-STVG lies in two specially designed modules, including instance context generation (ICG), which focuses on discovering visual context information (in both appearance and motion) of the instance, and instance context refinement (ICR), which aims to improve the instance context from ICG by eliminating irrelevant or even harmful information from the context.
Ranked #1 on Spatio-Temporal Video Grounding on HC-STVG1
1 code implementation • 11 Dec 2023 • Jifeng Shen, Teng Guo, Xin Zuo, Heng Fan, Wankou Yang
The AFSS module learns to provide reasonable scale prior information for different attribute groups, allowing the model to focus on different levels of feature maps with varying semantic granularity.
1 code implementation • 8 Dec 2023 • Deyuan Qu, Qi Chen, Tianyu Bai, Andy Qin, HongSheng Lu, Heng Fan, Song Fu, Qing Yang
Cooperative perception for connected and automated vehicles is traditionally achieved through the fusion of feature maps from two or more vehicles.
1 code implementation • 1 Dec 2023 • Shaohua Dong, Yunhe Feng, Qing Yang, Yan Huang, Dongfang Liu, Heng Fan
Existing approaches often fully fine-tune a dual-branch encoder-decoder framework with a complicated feature fusion strategy for achieving multimodal semantic segmentation, which is training-costly due to the massive parameter updates in feature extraction and fusion.
Ranked #2 on Semantic Segmentation on SUN-RGBD (using extra training data)
1 code implementation • 26 Nov 2023 • Bohai Gu, Yongsheng Yu, Heng Fan, Libo Zhang
Video inpainting has been challenged by complex scenarios like large movements and low-light conditions.
1 code implementation • 27 Sep 2023 • Libo Zhang, Xin Gu, CongCong Li, Tiejian Luo, Heng Fan
Specifically, we use lightweight ConvNets to extract features of the P-frames in the GOPs and spatial-channel attention module (SCAM) is designed to refine the feature representations of the P-frames based on the compressed information with bidirectional information flow.
1 code implementation • ICCV 2023 • Yaojie Shen, Xin Gu, Kai Xu, Heng Fan, Longyin Wen, Libo Zhang
Addressing this, we study video captioning from a different perspective in compressed domain, which brings multi-fold advantages over the existing pipeline: 1) Compared to raw images from the decoded video, the compressed video, consisting of I-frames, motion vectors and residuals, is highly distinguishable, which allows us to leverage the entire video for learning without manual sampling through a specialized model design; 2) The captioning model is more efficient in inference as smaller and less redundant information is processed.
Ranked #8 on Video Captioning on VATEX
no code implementations • 18 Sep 2023 • Hao Wang, Libo Zhang, Heng Fan, Tiejian Luo
Meanwhile, we propose a cross-granularity attention module to align the interactions modeled by the three branches of transformers, then the three branches of transformers can support each other to exploit the most discriminative semantic information of different granularities for accurate predictions of captions.
1 code implementation • ICCV 2023 • Wenzhang Zhou, Heng Fan, Tiejian Luo, Libo Zhang
In this work, drawing inspiration from the concept of stability from the control theory that a robust system requires to remain consistent both externally and internally regardless of disturbances, we propose a novel framework that achieves unsupervised domain adaptive detection through stability analysis.
1 code implementation • 15 Aug 2023 • Jifeng Shen, Yifei Chen, Yue Liu, Xin Zuo, Heng Fan, Wankou Yang
Effective feature fusion of multispectral images plays a crucial role in multi-spectral object detection.
Ranked #2 on Object Detection on VEDAI
no code implementations • 15 Aug 2023 • Yunhao Li, Zhen Xiao, Lin Yang, Dan Meng, Xin Zhou, Heng Fan, Libo Zhang
To the best of our knowledge, AttMOT is the first MOT dataset with semantic attributes.
1 code implementation • 19 Jul 2023 • Mingzhe Guo, Zhipeng Zhang, Liping Jing, Haibin Ling, Heng Fan
To thoroughly evidence the effectiveness of our method, we integrate the proposed framework on three tracking methods with different designs, i. e., the CNN-based SiamCAR, the Transformer-based OSTrack, and the hybrid structure TransT.
1 code implementation • 17 Jul 2023 • Yongsheng Yu, Heng Fan, Libo Zhang
Firstly, we pretrain a image inpainting model DMT_img serve as a prior for distilling the video model DMT_vid, thereby benefiting the hallucination of deficiency cases.
Ranked #1 on Video Inpainting on DAVIS
no code implementations • 19 May 2023 • Yongsheng Yu, Hao Wang, Tiejian Luo, Heng Fan, Libo Zhang
In this paper, we propose a novel, simple yet effective method for Multi-modal Guided Image Completion, dubbed MaGIC, which not only supports a wide range of single modality as the guidance (e. g., text, canny edge, sketch, segmentation, depth, and pose), but also adapts to arbitrarily customized combination of these modalities (i. e., arbitrary multi-modality) for image completion.
1 code implementation • ICCV 2023 • Bohai Gu, Heng Fan, Libo Zhang
Current arbitrary style transfer models are limited to either image or video domains.
no code implementations • 20 Mar 2023 • Zhenyu Li, Zhipeng Zhang, Heng Fan, Yuan He, Ke Wang, Xianming Liu, Junjun Jiang
In this paper, we improve the challenging monocular 3D object detection problem with a general semi-supervised framework.
1 code implementation • 19 Mar 2023 • Srikar Yellapragada, Zhenghong Li, Kevin Bhadresh Doshi, Purva Makarand Mhasakar, Heng Fan, Jie Wei, Erik Blasch, Bin Zhang, Haibin Ling
In this paper, we present a meticulously crafted and annotated benchmark, called \textbf{CCTV-Gun}, which addresses the challenges of detecting handguns in real-world CCTV images.
no code implementations • ICCV 2023 • Xinran Liu, Xiaoqiong Liu, Ziruo Yi, Xin Zhou, Thanh Le, Libo Zhang, Yan Huang, Qing Yang, Heng Fan
In addition, we further derive a variant named PlanarTrack$_{\mathbf{BB}}$ for generic object tracking from PlanarTrack.
no code implementations • 1 Jan 2023 • Libo Zhang, Wenzhang Zhou, Heng Fan, Tiejian Luo, Haibin Ling
To reduce discrepancy in feature distributions between two domains, recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning.
3 code implementations • 19 Nov 2022 • Libo Zhang, Lutao Jiang, Ruyi Ji, Heng Fan
Automatic security inspection relying on computer vision technology is a challenging task in real-world scenarios due to many factors, such as intra-class variance, class imbalance, and occlusion.
no code implementations • 25 Aug 2022 • Yongsheng Yu, Libo Zhang, Heng Fan, Tiejian Luo
Addressing this problem, in this paper, we devise a novel GAN inversion model for image inpainting, dubbed InvertFill, mainly consisting of an encoder with a pre-modulation module and a GAN generator with F&W+ latent space.
1 code implementation • 3 Jul 2022 • Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing
By revealing the potential of VL representation, we expect the community to divert more attention to VL tracking and hope to open more possibilities for future tracking beyond Transformer.
no code implementations • 30 Apr 2022 • Libo Zhang, Junyuan Gao, Zhen Xiao, Heng Fan
Multi-animal tracking (MAT), a multi-object tracking (MOT) problem, is crucial for animal motion and behavior analysis and has many crucial applications such as biology, ecology and animal conservation.
no code implementations • 7 Jan 2022 • Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing, Yilin Lyu, Bing Li, Weiming Hu
The proposed GIM module and InBN mechanism are general and applicable to different backbone types including CNN and Transformer for improvements, as evidenced by our extensive experiments on multiple benchmarks.
1 code implementation • 2 Dec 2021 • Liting Lin, Heng Fan, Zhipeng Zhang, Yong Xu, Haibin Ling
The potential of Transformer in representation learning remains under-explored.
Ranked #10 on Visual Object Tracking on TrackingNet
no code implementations • 19 Oct 2021 • Heng Fan, Jiaxiang Ren, Jie Yang, Yi-Xian Qin, Haibin Ling
The aim of this study was to investigate whether a deep convolutional neural network (CNN) with an attention module can detect osteoporosis on panoramic radiographs.
1 code implementation • 19 Jul 2021 • Dawei Du, Longyin Wen, Pengfei Zhu, Heng Fan, QinGhua Hu, Haibin Ling, Mubarak Shah, Junwen Pan, Ali Al-Ali, Amr Mohamed, Bakour Imene, Bin Dong, Binyu Zhang, Bouchali Hadia Nesma, Chenfeng Xu, Chenzhen Duan, Ciro Castiello, Corrado Mencar, Dingkang Liang, Florian Krüger, Gennaro Vessio, Giovanna Castellano, Jieru Wang, Junyu Gao, Khalid Abualsaud, Laihui Ding, Lei Zhao, Marco Cianciotta, Muhammad Saqib, Noor Almaadeed, Omar Elharrouss, Pei Lyu, Qi Wang, Shidong Liu, Shuang Qiu, Siyang Pan, Somaya Al-Maadeed, Sultan Daud Khan, Tamer Khattab, Tao Han, Thomas Golda, Wei Xu, Xiang Bai, Xiaoqing Xu, Xuelong Li, Yanyun Zhao, Ye Tian, Yingnan Lin, Yongchao Xu, Yuehan Yao, Zhenyu Xu, Zhijian Zhao, Zhipeng Luo, Zhiwei Wei, Zhiyuan Zhao
Crowd counting on the drone platform is an interesting topic in computer vision, which brings new challenges such as small object inference, background clutter and wide viewpoint.
no code implementations • 25 Nov 2020 • Heng Fan, Haibin Ling
The key is to bridge box regression and classification via an alignment step, which leads to more accurate features for proposal classification with improved robustness.
no code implementations • ICCV 2021 • Heng Fan, Halady Akhilesha Miththanthaya, Harshit, Siranjiv Ramana Rajan, Xiaoqiong Liu, Zhilin Zou, Yuewei Lin, Haibin Ling
To the best of our knowledge, TOTB is the first benchmark dedicated to transparent object tracking.
1 code implementation • 8 Sep 2020 • Heng Fan, Hexin Bai, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Harshit, Mingzhen Huang, Juehuan Liu, Yong Xu, Chunyuan Liao, Lin Yuan, Haibin Ling
The average video length of LaSOT is around 2, 500 frames, where each video contains various challenge factors that exist in real world video footage, such as the targets disappearing and re-appearing.
2 code implementations • 16 Jan 2020 • Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Heng Fan, QinGhua Hu, Haibin Ling
We provide a large-scale drone captured dataset, VisDrone, which includes four tracks, i. e., (1) image object detection, (2) video object detection, (3) single object tracking, and (4) multi-object tracking.
no code implementations • 15 Dec 2019 • Jianqing Jia, Semir Elezovikj, Heng Fan, Shuojin Yang, Jing Liu, Wei Guo, Chiu C. Tan, Haibin Ling
Our solution encodes the constraints for placing labels in an optimization problem to obtain the final label layout, and the labels will be placed in appropriate positions to reduce the chances of overlaying important real-world objects in street view AR scenarios.
no code implementations • 18 Nov 2019 • Heng Fan, Fan Yang, Peng Chu, Lin Yuan, Haibin Ling
For the analysis component, given the tracking results on all sequences, it investigates the behavior of the tracker under each individual factor and generates the report automatically.
1 code implementation • 25 Oct 2019 • Liu Ying, Heng Fan, Fuchuan Ni, Jinhai Xiang
In addition, to further improve the transfer accuracy of generated images, an attribute adversarial classifier (referred to as Atta-cls) is introduced to guide the generator from the perspective of attribute through learning the defects of attribute transfer images.
no code implementations • 7 Sep 2019 • Zhen-Biao Yang, Pei-Rong Han, Xin-Jie Huang, Wen Ning, HekangLi, Kai Xu, Dongning Zheng, Heng Fan, Shi-Biao Zheng
No-cloning theorem forbids perfect cloning of an unknown quantum state.
Quantum Physics
1 code implementation • ICCV 2019 • Fan Yang, Heng Fan, Peng Chu, Erik Blasch, Haibin Ling
The key components in ClusDet include a cluster proposal sub-network (CPNet), a scale estimation sub-network (ScaleNet), and a dedicated detection network (DetecNet).
no code implementations • 21 Feb 2019 • Peng Chu, Heng Fan, Chiu C. Tan, Haibin Ling
To address this issue, in this paper we propose an instance-aware tracker to integrate SOT techniques for MOT by encoding awareness both within and between target models.
no code implementations • CVPR 2019 • Heng Fan, Haibin Ling
C-RPN is trained end-to-end with the multi-task loss function.
no code implementations • 9 Nov 2018 • Heng Fan, Peng Chu, Longin Jan Latecki, Haibin Ling
Recurrent neural networks (RNNs) have shown the ability to improve scene parsing through capturing long-range dependencies among image units.
1 code implementation • CVPR 2019 • Heng Fan, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, Haibin Ling
In this paper, we present LaSOT, a high-quality benchmark for Large-scale Single Object Tracking.
no code implementations • 15 May 2018 • Qin Zhou, Heng Fan, Hua Yang, Hang Su, Shibao Zheng, Shuang Wu, Haibin Ling
To address this problem, in this paper, we present a robust and efficient graph correspondence transfer (REGCT) approach for explicit spatial alignment in Re-ID.
no code implementations • 1 Apr 2018 • Qin Zhou, Heng Fan, Shibao Zheng, Hang Su, Xinzhe Li, Shuang Wu, Haibin Ling
In this paper, we propose a graph correspondence transfer (GCT) approach for person re-identification.
no code implementations • 22 Mar 2018 • Zhigang Chang, Qin Zhou, Heng Fan, Hang Su, Hua Yang, Shibao Zheng, Haibin Ling
Meanwhile, a weighting scheme is applied on the bilinear coding to adaptively adjust the weights of local features at different locations based on their importance in recognition, further improving the discriminability of feature aggregation.
no code implementations • 30 Jan 2018 • Heng Fan, Haibin Ling
Being intensively studied, visual object tracking has witnessed great advances in either speed (e. g., with correlation filters) or accuracy (e. g., with deep features).
no code implementations • 21 Jan 2018 • Heng Fan, Haibin Ling
Recently recurrent neural networks (RNNs) have demonstrated the ability to improve scene labeling through capturing long-range dependencies among image units.
1 code implementation • 8 Dec 2017 • Jun Wang, Zhao-Yu Han, Song-Bo Wang, Zeyang Li, Liang-Zhu Mu, Heng Fan, Lei Wang
We propose a quantum tomography scheme for pure qudit systems which adopts random base measurements and generative learning methods, along with a built-in fidelity estimation approach to assess the reliability of the tomographic states.
Quantum Physics
1 code implementation • 6 Sep 2017 • Zhao-Yu Han, Jun Wang, Heng Fan, Lei Wang, Pan Zhang
Generative modeling, which learns joint probability distribution from data and generates samples according to it, is an important task in machine learning and artificial intelligence.
no code implementations • ICCV 2017 • Heng Fan, Haibin Ling
In this paper we study the problem from a new perspective and present a novel parallel tracking and verifying (PTAV) framework, by taking advantage of the ubiquity of multi-thread techniques and borrowing from the success of parallel tracking and mapping in visual SLAM.
no code implementations • 21 Nov 2016 • Heng Fan, Haibin Ling
Convolutional neural network (CNN) has drawn increasing interest in visual tracking owing to its powerfulness in feature extraction.
no code implementations • 8 Jul 2016 • Heng Fan, Xue Mei, Danil Prokhorov, Haibin Ling
Context in image is crucial for scene labeling while existing methods only exploit local context generated from a small surrounding area of an image patch or a pixel, by contrast long-range and global contextual information is ignored.