Search Results for author: Heng Fan

Found 56 papers, 27 papers with code

Benchmarking the Robustness of UAV Tracking Against Common Corruptions

1 code implementation • 18 Mar 2024 • Xiaoqiong Liu, Yunhe Feng, Shu Hu, Xiaohui Yuan, Heng Fan

Addressing this, we propose UAV-C, a large-scale benchmark for assessing robustness of UAV trackers under common corruptions.

Benchmarking

Paper
Code

S3LLM: Large-Scale Scientific Software Understanding with LLMs using Source, Metadata, and Document

1 code implementation • 15 Mar 2024 • Kareem Shaik, Dali Wang, Weijian Zheng, Qinglei Cao, Heng Fan, Peter Schwartz, Yunhe Feng

S3LLM demonstrates the potential of using locally deployed open-source LLMs for the rapid understanding of large-scale scientific computing software, eliminating the need for extensive coding expertise, and thereby making the process more efficient and effective.

Natural Language Queries

Paper
Code

Beyond MOT: Semantic Multi-Object Tracking

no code implementations • 8 Mar 2024 • Yunhao Li, Hao Wang, Xue Ma, Jiali Yao, Shaohua Dong, Heng Fan, Libo Zhang

Current multi-object tracking (MOT) aims to predict trajectories of targets (i. e.,"where") in videos.

Multi-Object Tracking Object +1

Paper
Add Code

Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance

no code implementations • 8 Mar 2024 • Liting Lin, Heng Fan, Zhipeng Zhang, YaoWei Wang, Yong Xu, Haibin Ling

The shared embeddings, which describe the absolute coordinates of multi-resolution images (namely, the template and search images), are inherited from the pre-trained backbones.

Inductive Bias Position +1

Paper
Add Code

VastTrack: Vast Category Visual Object Tracking

1 code implementation • 6 Mar 2024 • Liang Peng, Junyuan Gao, Xinran Liu, Weihong Li, Shaohua Dong, Zhipeng Zhang, Heng Fan, Libo Zhang

The rich annotations of VastTrack enables development of both the vision-only and the vision-language tracking.

Object Visual Object Tracking +1

Paper
Code

Neural Radiance Fields in Medical Imaging: Challenges and Next Steps

no code implementations • 26 Feb 2024 • Xin Wang, Shu Hu, Heng Fan, Hongtu Zhu, Xin Li

Neural Radiance Fields (NeRF), as a pioneering technique in computer vision, offer great potential to revolutionize medical imaging by synthesizing three-dimensional representations from the projected two-dimensional image data.

Paper
Add Code

Context-Guided Spatio-Temporal Video Grounding

1 code implementation • 3 Jan 2024 • Xin Gu, Heng Fan, Yan Huang, Tiejian Luo, Libo Zhang

The key of CG-STVG lies in two specially designed modules, including instance context generation (ICG), which focuses on discovering visual context information (in both appearance and motion) of the instance, and instance context refinement (ICR), which aims to improve the instance context from ICG by eliminating irrelevant or even harmful information from the context.

Ranked #1 on Spatio-Temporal Video Grounding on HC-STVG1

Object Spatio-Temporal Video Grounding +1

Paper
Code

SSPNet: Scale and Spatial Priors Guided Generalizable and Interpretable Pedestrian Attribute Recognition

1 code implementation • 11 Dec 2023 • Jifeng Shen, Teng Guo, Xin Zuo, Heng Fan, Wankou Yang

The AFSS module learns to provide reasonable scale prior information for different attribute groups, allowing the model to focus on different levels of feature maps with varying semantic granularity.

Attribute Pedestrian Attribute Recognition

Paper
Code

SiCP: Simultaneous Individual and Cooperative Perception for 3D Object Detection in Connected and Automated Vehicles

1 code implementation • 8 Dec 2023 • Deyuan Qu, Qi Chen, Tianyu Bai, Andy Qin, HongSheng Lu, Heng Fan, Song Fu, Qing Yang

Cooperative perception for connected and automated vehicles is traditionally achieved through the fusion of feature maps from two or more vehicles.

3D Object Detection object-detection

Paper
Code

Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning

1 code implementation • 1 Dec 2023 • Shaohua Dong, Yunhe Feng, Qing Yang, Yan Huang, Dongfang Liu, Heng Fan

Existing approaches often fully fine-tune a dual-branch encoder-decoder framework with a complicated feature fusion strategy for achieving multimodal semantic segmentation, which is training-costly due to the massive parameter updates in feature extraction and fusion.

Ranked #2 on Semantic Segmentation on SUN-RGBD (using extra training data)

Decoder object-detection +7

Paper
Code

Flow-Guided Diffusion for Video Inpainting

1 code implementation • 26 Nov 2023 • Bohai Gu, Yongsheng Yu, Heng Fan, Libo Zhang

Video inpainting has been challenged by complex scenarios like large movements and low-light conditions.

Denoising Image Generation +2

Paper
Code

Local Compressed Video Stream Learning for Generic Event Boundary Detection

1 code implementation • 27 Sep 2023 • Libo Zhang, Xin Gu, CongCong Li, Tiejian Luo, Heng Fan

Specifically, we use lightweight ConvNets to extract features of the P-frames in the GOPs and spatial-channel attention module (SCAM) is designed to refine the feature representations of the P-frames based on the compressed information with bidirectional information flow.

Boundary Detection Generic Event Boundary Detection +1

Paper
Code

Accurate and Fast Compressed Video Captioning

1 code implementation • ICCV 2023 • Yaojie Shen, Xin Gu, Kai Xu, Heng Fan, Longyin Wen, Libo Zhang

Addressing this, we study video captioning from a different perspective in compressed domain, which brings multi-fold advantages over the existing pipeline: 1) Compared to raw images from the decoded video, the compressed video, consisting of I-frames, motion vectors and residuals, is highly distinguishable, which allows us to leverage the entire video for learning without manual sampling through a specialized model design; 2) The captioning model is more efficient in inference as smaller and less redundant information is processed.

Ranked #8 on Video Captioning on VATEX

Video Captioning

Paper
Code

Collaborative Three-Stream Transformers for Video Captioning

no code implementations • 18 Sep 2023 • Hao Wang, Libo Zhang, Heng Fan, Tiejian Luo

Meanwhile, we propose a cross-granularity attention module to align the interactions modeled by the three branches of transformers, then the three branches of transformers can support each other to exploit the most discriminative semantic information of different granularities for accurate predictions of captions.

Sentence Video Captioning

Paper
Add Code

Unsupervised Domain Adaptive Detection with Network Stability Analysis

1 code implementation • ICCV 2023 • Wenzhang Zhou, Heng Fan, Tiejian Luo, Libo Zhang

In this work, drawing inspiration from the concept of stability from the control theory that a robust system requires to remain consistent both externally and internally regardless of disturbances, we propose a novel framework that achieves unsupervised domain adaptive detection through stability analysis.

Domain Adaptation

Paper
Code

ICAFusion: Iterative Cross-Attention Guided Feature Fusion for Multispectral Object Detection

1 code implementation • 15 Aug 2023 • Jifeng Shen, Yifei Chen, Yue Liu, Xin Zuo, Heng Fan, Wankou Yang

Effective feature fusion of multispectral images plays a crucial role in multi-spectral object detection.

Ranked #2 on Object Detection on VEDAI

Multispectral Object Detection object-detection +1

Paper
Code

AttMOT: Improving Multiple-Object Tracking by Introducing Auxiliary Pedestrian Attributes

no code implementations • 15 Aug 2023 • Yunhao Li, Zhen Xiao, Lin Yang, Dan Meng, Xin Zhou, Heng Fan, Libo Zhang

To the best of our knowledge, AttMOT is the first MOT dataset with semantic attributes.

Attribute Multi-Object Tracking +1

Paper
Add Code

Divert More Attention to Vision-Language Object Tracking

1 code implementation • 19 Jul 2023 • Mingzhe Guo, Zhipeng Zhang, Liping Jing, Haibin Ling, Heng Fan

To thoroughly evidence the effectiveness of our method, we integrate the proposed framework on three tracking methods with different designs, i. e., the CNN-based SiamCAR, the Transformer-based OSTrack, and the hybrid structure TransT.

Attribute Object +1

468

Paper
Code

Deficiency-Aware Masked Transformer for Video Inpainting

1 code implementation • 17 Jul 2023 • Yongsheng Yu, Heng Fan, Libo Zhang

Firstly, we pretrain a image inpainting model DMT_img serve as a prior for distilling the video model DMT_vid, thereby benefiting the hallucination of deficiency cases.

Ranked #1 on Video Inpainting on DAVIS

Hallucination Image Inpainting +2

Paper
Code

MaGIC: Multi-modality Guided Image Completion

no code implementations • 19 May 2023 • Yongsheng Yu, Hao Wang, Tiejian Luo, Heng Fan, Libo Zhang

In this paper, we propose a novel, simple yet effective method for Multi-modal Guided Image Completion, dubbed MaGIC, which not only supports a wide range of single modality as the guidance (e. g., text, canny edge, sketch, segmentation, depth, and pose), but also adapts to arbitrarily customized combination of these modalities (i. e., arbitrary multi-modality) for image completion.

Paper
Add Code

Two Birds, One Stone: A Unified Framework for Joint Learning of Image and Video Style Transfers

1 code implementation • ICCV 2023 • Bohai Gu, Heng Fan, Libo Zhang

Current arbitrary style transfer models are limited to either image or video domains.

Computational Efficiency Style Transfer +1

Paper
Code

Augment and Criticize: Exploring Informative Samples for Semi-Supervised Monocular 3D Object Detection

no code implementations • 20 Mar 2023 • Zhenyu Li, Zhipeng Zhang, Heng Fan, Yuan He, Ke Wang, Xianming Liu, Junjun Jiang

In this paper, we improve the challenging monocular 3D object detection problem with a general semi-supervised framework.

Monocular 3D Object Detection object-detection +1

Paper
Add Code

CCTV-Gun: Benchmarking Handgun Detection in CCTV Images

1 code implementation • 19 Mar 2023 • Srikar Yellapragada, Zhenghong Li, Kevin Bhadresh Doshi, Purva Makarand Mhasakar, Heng Fan, Jie Wei, Erik Blasch, Bin Zhang, Haibin Ling

In this paper, we present a meticulously crafted and annotated benchmark, called \textbf{CCTV-Gun}, which addresses the challenges of detecting handguns in real-world CCTV images.

Benchmarking object-detection +1

Paper
Code

PlanarTrack: A Large-scale Challenging Benchmark for Planar Object Tracking

no code implementations • ICCV 2023 • Xinran Liu, Xiaoqiong Liu, Ziruo Yi, Xin Zhou, Thanh Le, Libo Zhang, Yan Huang, Qing Yang, Heng Fan

In addition, we further derive a variant named PlanarTrack$_{\mathbf{BB}}$ for generic object tracking from PlanarTrack.

Object Tracking

Paper
Add Code

Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment

no code implementations • 1 Jan 2023 • Libo Zhang, Wenzhang Zhou, Heng Fan, Tiejian Luo, Haibin Ling

To reduce discrepancy in feature distributions between two domains, recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning.

Domain Adaptation object-detection +1

Paper
Add Code

PIDray: A Large-scale X-ray Benchmark for Real-World Prohibited Item Detection

3 code implementations • 19 Nov 2022 • Libo Zhang, Lutao Jiang, Ruyi Ji, Heng Fan

Automatic security inspection relying on computer vision technology is a challenging task in real-world scenarios due to many factors, such as intra-class variance, class imbalance, and occlusion.

Binary Classification Instance Segmentation +4

Paper
Code

High-Fidelity Image Inpainting with GAN Inversion

no code implementations • 25 Aug 2022 • Yongsheng Yu, Libo Zhang, Heng Fan, Tiejian Luo

Addressing this problem, in this paper, we devise a novel GAN inversion model for image inpainting, dubbed InvertFill, mainly consisting of an encoder with a pre-modulation module and a GAN generator with F&W+ latent space.

Image Inpainting Vocal Bursts Intensity Prediction

Paper
Add Code

Divert More Attention to Vision-Language Tracking

1 code implementation • 3 Jul 2022 • Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing

By revealing the potential of VL representation, we expect the community to divert more attention to VL tracking and hope to open more possibilities for future tracking beyond Transformer.

Object Tracking

468

Paper
Code

AnimalTrack: A Benchmark for Multi-Animal Tracking in the Wild

no code implementations • 30 Apr 2022 • Libo Zhang, Junyuan Gao, Zhen Xiao, Heng Fan

Multi-animal tracking (MAT), a multi-object tracking (MOT) problem, is crucial for animal motion and behavior analysis and has many crucial applications such as biology, ecology and animal conservation.

Multi-Object Tracking

Paper
Add Code

Learning Target-aware Representation for Visual Tracking via Informative Interactions

no code implementations • 7 Jan 2022 • Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing, Yilin Lyu, Bing Li, Weiming Hu

The proposed GIM module and InBN mechanism are general and applicable to different backbone types including CNN and Transformer for improvements, as evidenced by our extensive experiments on multiple benchmarks.

Representation Learning Visual Tracking

Paper
Add Code

SwinTrack: A Simple and Strong Baseline for Transformer Tracking

1 code implementation • 2 Dec 2021 • Liting Lin, Heng Fan, Zhipeng Zhang, Yong Xu, Haibin Ling

The potential of Transformer in representation learning remains under-explored.

Ranked #10 on Visual Object Tracking on TrackingNet

Representation Learning Visual Object Tracking +1

233

Paper
Code

Osteoporosis Prescreening using Panoramic Radiographs through a Deep Convolutional Neural Network with Attention Mechanism

no code implementations • 19 Oct 2021 • Heng Fan, Jiaxiang Ren, Jie Yang, Yi-Xian Qin, Haibin Ling

The aim of this study was to investigate whether a deep convolutional neural network (CNN) with an attention module can detect osteoporosis on panoramic radiographs.

Paper
Add Code

VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results

1 code implementation • 19 Jul 2021 • Dawei Du, Longyin Wen, Pengfei Zhu, Heng Fan, QinGhua Hu, Haibin Ling, Mubarak Shah, Junwen Pan, Ali Al-Ali, Amr Mohamed, Bakour Imene, Bin Dong, Binyu Zhang, Bouchali Hadia Nesma, Chenfeng Xu, Chenzhen Duan, Ciro Castiello, Corrado Mencar, Dingkang Liang, Florian Krüger, Gennaro Vessio, Giovanna Castellano, Jieru Wang, Junyu Gao, Khalid Abualsaud, Laihui Ding, Lei Zhao, Marco Cianciotta, Muhammad Saqib, Noor Almaadeed, Omar Elharrouss, Pei Lyu, Qi Wang, Shidong Liu, Shuang Qiu, Siyang Pan, Somaya Al-Maadeed, Sultan Daud Khan, Tamer Khattab, Tao Han, Thomas Golda, Wei Xu, Xiang Bai, Xiaoqing Xu, Xuelong Li, Yanyun Zhao, Ye Tian, Yingnan Lin, Yongchao Xu, Yuehan Yao, Zhenyu Xu, Zhijian Zhao, Zhipeng Luo, Zhiwei Wei, Zhiyuan Zhao

Crowd counting on the drone platform is an interesting topic in computer vision, which brings new challenges such as small object inference, background clutter and wide viewpoint.

Crowd Counting

Paper
Code

CRACT: Cascaded Regression-Align-Classification for Robust Visual Tracking

no code implementations • 25 Nov 2020 • Heng Fan, Haibin Ling

The key is to bridge box regression and classification via an alignment step, which leads to more accurate features for proposal classification with improved robustness.

Classification General Classification +3

Paper
Add Code

Transparent Object Tracking Benchmark

no code implementations • ICCV 2021 • Heng Fan, Halady Akhilesha Miththanthaya, Harshit, Siranjiv Ramana Rajan, Xiaoqiong Liu, Zhilin Zou, Yuewei Lin, Haibin Ling

To the best of our knowledge, TOTB is the first benchmark dedicated to transparent object tracking.

Object Object Tracking +1

Paper
Add Code

LaSOT: A High-quality Large-scale Single Object Tracking Benchmark

1 code implementation • 8 Sep 2020 • Heng Fan, Hexin Bai, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Harshit, Mingzhen Huang, Juehuan Liu, Yong Xu, Chunyuan Liao, Lin Yuan, Haibin Ling

The average video length of LaSOT is around 2, 500 frames, where each video contains various challenge factors that exist in real world video footage, such as the targets disappearing and re-appearing.

Object Tracking Visual Tracking +1

102

Paper
Code

Detection and Tracking Meet Drones Challenge

2 code implementations • 16 Jan 2020 • Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Heng Fan, QinGhua Hu, Haibin Ling

We provide a large-scale drone captured dataset, VisDrone, which includes four tracks, i. e., (1) image object detection, (2) video object detection, (3) single object tracking, and (4) multi-object tracking.

Multi-Object Tracking Object +2

12,135

Paper
Code

Semantic-Aware Label Placement for Augmented Reality in Street View

no code implementations • 15 Dec 2019 • Jianqing Jia, Semir Elezovikj, Heng Fan, Shuojin Yang, Jing Liu, Wei Guo, Chiu C. Tan, Haibin Ling

Our solution encodes the constraints for placing labels in an optimization problem to obtain the final label layout, and the labels will be placed in appropriate positions to reduce the chances of overlaying important real-world objects in street view AR scenarios.

Paper
Add Code

TracKlinic: Diagnosis of Challenge Factors in Visual Tracking

no code implementations • 18 Nov 2019 • Heng Fan, Fan Yang, Peng Chu, Lin Yuan, Haibin Ling

For the analysis component, given the tracking results on all sequences, it investigates the behavior of the tracker under each individual factor and generates the report automatically.

Visual Tracking

Paper
Add Code

ClsGAN: Selective Attribute Editing Model Based On Classification Adversarial Network

1 code implementation • 25 Oct 2019 • Liu Ying, Heng Fan, Fuchuan Ni, Jinhai Xiang

In addition, to further improve the transfer accuracy of generated images, an attribute adversarial classifier (referred to as Atta-cls) is introduced to guide the generator from the perspective of attribute through learning the defects of attribute transfer images.

Attribute Classification +3

Paper
Code

Experimental demonstration of entanglement-enabled universal quantum cloning in a circuit

no code implementations • 7 Sep 2019 • Zhen-Biao Yang, Pei-Rong Han, Xin-Jie Huang, Wen Ning, HekangLi, Kai Xu, Dongning Zheng, Heng Fan, Shi-Biao Zheng

No-cloning theorem forbids perfect cloning of an unknown quantum state.

Quantum Physics

Paper
Add Code

Clustered Object Detection in Aerial Images

1 code implementation • ICCV 2019 • Fan Yang, Heng Fan, Peng Chu, Erik Blasch, Haibin Ling

The key components in ClusDet include a cluster proposal sub-network (CPNet), a scale estimation sub-network (ScaleNet), and a dedicated detection network (DetecNet).

Clustering Object +2

116

Paper
Code

Online Multi-Object Tracking with Instance-Aware Tracker and Dynamic Model Refreshment

no code implementations • 21 Feb 2019 • Peng Chu, Heng Fan, Chiu C. Tan, Haibin Ling

To address this issue, in this paper we propose an instance-aware tracker to integrate SOT techniques for MOT by encoding awareness both within and between target models.

Multi-Object Tracking Online Multi-Object Tracking

Paper
Add Code

Siamese Cascaded Region Proposal Networks for Real-Time Visual Tracking

no code implementations • CVPR 2019 • Heng Fan, Haibin Ling

C-RPN is trained end-to-end with the multi-task loss function.

Real-Time Visual Tracking Region Proposal

Paper
Add Code

Scene Parsing via Dense Recurrent Neural Networks with Attentional Selection

no code implementations • 9 Nov 2018 • Heng Fan, Peng Chu, Longin Jan Latecki, Haibin Ling

Recurrent neural networks (RNNs) have shown the ability to improve scene parsing through capturing long-range dependencies among image units.

Scene Labeling

Paper
Add Code

LaSOT: A High-quality Benchmark for Large-scale Single Object Tracking

1 code implementation • CVPR 2019 • Heng Fan, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Hexin Bai, Yong Xu, Chunyuan Liao, Haibin Ling

In this paper, we present LaSOT, a high-quality benchmark for Large-scale Single Object Tracking.

Object Tracking Vocal Bursts Intensity Prediction

102

Paper
Code

Robust and Efficient Graph Correspondence Transfer for Person Re-identification

no code implementations • 15 May 2018 • Qin Zhou, Heng Fan, Hua Yang, Hang Su, Shibao Zheng, Shuang Wu, Haibin Ling

To address this problem, in this paper, we present a robust and efficient graph correspondence transfer (REGCT) approach for explicit spatial alignment in Re-ID.

Graph Matching Person Re-Identification

Paper
Add Code

Graph Correspondence Transfer for Person Re-identification

no code implementations • 1 Apr 2018 • Qin Zhou, Heng Fan, Shibao Zheng, Hang Su, Xinzhe Li, Shuang Wu, Haibin Ling

In this paper, we propose a graph correspondence transfer (GCT) approach for person re-identification.

Graph Matching Person Re-Identification

Paper
Add Code

Weighted Bilinear Coding over Salient Body Parts for Person Re-identification

no code implementations • 22 Mar 2018 • Zhigang Chang, Qin Zhou, Heng Fan, Hang Su, Hua Yang, Shibao Zheng, Haibin Ling

Meanwhile, a weighting scheme is applied on the bilinear coding to adaptively adjust the weights of local features at different locations based on their importance in recognition, further improving the discriminability of feature aggregation.

Person Re-Identification

Paper
Add Code

Parallel Tracking and Verifying

no code implementations • 30 Jan 2018 • Heng Fan, Haibin Ling

Being intensively studied, visual object tracking has witnessed great advances in either speed (e. g., with correlation filters) or accuracy (e. g., with deep features).

Visual Object Tracking

Paper
Add Code

Dense Recurrent Neural Networks for Scene Labeling

no code implementations • 21 Jan 2018 • Heng Fan, Haibin Ling

Recently recurrent neural networks (RNNs) have demonstrated the ability to improve scene labeling through capturing long-range dependencies among image units.

Scene Labeling

Paper
Add Code

Scalable Quantum Tomography with Fidelity Estimation

1 code implementation • 8 Dec 2017 • Jun Wang, Zhao-Yu Han, Song-Bo Wang, Zeyang Li, Liang-Zhu Mu, Heng Fan, Lei Wang

We propose a quantum tomography scheme for pure qudit systems which adopts random base measurements and generative learning methods, along with a built-in fidelity estimation approach to assess the reliability of the tomographic states.

Quantum Physics

Paper
Code

Unsupervised Generative Modeling Using Matrix Product States

1 code implementation • 6 Sep 2017 • Zhao-Yu Han, Jun Wang, Heng Fan, Lei Wang, Pan Zhang

Generative modeling, which learns joint probability distribution from data and generates samples according to it, is an important task in machine learning and artificial intelligence.

BIG-bench Machine Learning

Paper
Code

Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking

no code implementations • ICCV 2017 • Heng Fan, Haibin Ling

In this paper we study the problem from a new perspective and present a novel parallel tracking and verifying (PTAV) framework, by taking advantage of the ubiquity of multi-thread techniques and borrowing from the success of parallel tracking and mapping in visual SLAM.

Visual Tracking

Paper
Add Code

SANet: Structure-Aware Network for Visual Tracking

no code implementations • 21 Nov 2016 • Heng Fan, Haibin Ling

Convolutional neural network (CNN) has drawn increasing interest in visual tracking owing to its powerfulness in feature extraction.

General Classification Object +1

Paper
Add Code

Multi-level Contextual RNNs with Attention Model for Scene Labeling

no code implementations • 8 Jul 2016 • Heng Fan, Xue Mei, Danil Prokhorov, Haibin Ling

Context in image is crucial for scene labeling while existing methods only exploit local context generated from a small surrounding area of an image patch or a pixel, by contrast long-range and global contextual information is ignored.

Scene Labeling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.