Search Results for author: Zilong Huang

Found 28 papers, 18 papers with code

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

3 code implementations19 Jan 2024 Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao

To this end, we scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data (~62M), which significantly enlarges the data coverage and thus is able to reduce the generalization error.

Ranked #3 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)

Data Augmentation Monocular Depth Estimation +1

Harnessing Diffusion Models for Visual Perception with Meta Prompts

1 code implementation22 Dec 2023 Qiang Wan, Zilong Huang, Bingyi Kang, Jiashi Feng, Li Zhang

Our key insight is to introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.

Ranked #2 on Semantic Segmentation on Cityscapes test (using extra training data)

Monocular Depth Estimation Pose Estimation +1

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

1 code implementation17 Jul 2023 Yang Zhao, Zhijie Lin, Daquan Zhou, Zilong Huang, Jiashi Feng, Bingyi Kang

Our experiments show that BuboGPT achieves impressive multi-modality understanding and visual grounding abilities during the interaction with human.

Instruction Following Sentence +1

Disentangled Pre-training for Image Matting

1 code implementation3 Apr 2023 Yanda Li, Zilong Huang, Gang Yu, Ling Chen, Yunchao Wei, Jianbo Jiao

The pre-training task is designed in a similar manner as image matting, where random trimap and alpha matte are generated to achieve an image disentanglement objective.

Disentanglement Image Matting

SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation

1 code implementation30 Jan 2023 Qiang Wan, Zilong Huang, Jiachen Lu, Gang Yu, Li Zhang

Coupled with a light segmentation head, we achieve the best trade-off between segmentation accuracy and latency on the ARM-based mobile devices on the ADE20K and Cityscapes datasets.

Image Classification Segmentation +1

Executing your Commands via Motion Diffusion in Latent Space

1 code implementation CVPR 2023 Xin Chen, Biao Jiang, Wen Liu, Zilong Huang, Bin Fu, Tao Chen, Jingyi Yu, Gang Yu

We study a challenging task, conditional human motion generation, which produces plausible human motion sequences according to various conditional inputs, such as action classes or textual descriptors.

Motion Synthesis

Coordinates Are NOT Lonely -- Codebook Prior Helps Implicit Neural 3D Representations

1 code implementation20 Oct 2022 Fukun Yin, Wen Liu, Zilong Huang, Pei Cheng, Tao Chen, Gang Yu

Implicit neural 3D representation has achieved impressive results in surface or scene reconstruction and novel view synthesis, which typically uses the coordinate-based multi-layer perceptrons (MLPs) to learn a continuous scene representation.

Novel View Synthesis

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

3 code implementations CVPR 2022 Wenqiang Zhang, Zilong Huang, Guozhong Luo, Tao Chen, Xinggang Wang, Wenyu Liu, Gang Yu, Chunhua Shen

Although vision transformers (ViTs) have achieved great success in computer vision, the heavy computational cost hampers their applications to dense prediction tasks such as semantic segmentation on mobile devices.

Segmentation Semantic Segmentation

SeMask: Semantically Masked Transformers for Semantic Segmentation

1 code implementation arXiv 2021 Jitesh Jain, Anukriti Singh, Nikita Orlov, Zilong Huang, Jiachen Li, Steven Walton, Humphrey Shi

To achieve this, we propose SeMask, a simple and effective framework that incorporates semantic information into the encoder with the help of a semantic attention operation.

Semantic Segmentation

Shuffle Transformer with Feature Alignment for Video Face Parsing

no code implementations16 Jun 2021 Rui Zhang, Yang Han, Zilong Huang, Pei Cheng, Guozhong Luo, Gang Yu, Bin Fu

This is a short technical report introducing the solution of the Team TCParser for Short-video Face Parsing Track of The 3rd Person in Context (PIC) Workshop and Challenge at CVPR 2021.

Face Parsing

Half-Real Half-Fake Distillation for Class-Incremental Semantic Segmentation

no code implementations2 Apr 2021 Zilong Huang, Wentian Hao, Xinggang Wang, Mingyuan Tao, Jianqiang Huang, Wenyu Liu, Xian-Sheng Hua

Despite their success for semantic segmentation, convolutional neural networks are ill-equipped for incremental learning, \ie, adapting the original segmentation model as new classes are available but the initial training data is not retained.

Class-Incremental Semantic Segmentation Incremental Learning +1

Deep Learning-Based Automated Image Segmentation for Concrete Petrographic Analysis

no code implementations21 May 2020 Yu Song, Zilong Huang, Chuanyue Shen, Humphrey Shi, David A Lange

The standard petrography test method for measuring air voids in concrete (ASTM C457) requires a meticulous and long examination of sample phase composition under a stereomicroscope.

Image Segmentation Segmentation +1

AlignSeg: Feature-Aligned Segmentation Networks

1 code implementation24 Feb 2020 Zilong Huang, Yunchao Wei, Xinggang Wang, Wenyu Liu, Thomas S. Huang, Humphrey Shi

Aggregating features in terms of different convolutional blocks or contextual embeddings has been proven to be an effective way to strengthen feature representations for semantic segmentation.

Segmentation Semantic Segmentation

SPGNet: Semantic Prediction Guidance for Scene Parsing

no code implementations ICCV 2019 Bowen Cheng, Liang-Chieh Chen, Yunchao Wei, Yukun Zhu, Zilong Huang, JinJun Xiong, Thomas Huang, Wen-mei Hwu, Honghui Shi

The multi-scale context module refers to the operations to aggregate feature responses from a large spatial extent, while the single-stage encoder-decoder structure encodes the high-level semantic information in the encoder path and recovers the boundary information in the decoder path.

Pose Estimation Scene Parsing +2

CCNet: Criss-Cross Attention for Semantic Segmentation

4 code implementations ICCV 2019 Zilong Huang, Xinggang Wang, Yunchao Wei, Lichao Huang, Humphrey Shi, Wenyu Liu, Thomas S. Huang

Compared with the non-local block, the proposed recurrent criss-cross attention module requires 11x less GPU memory usage.

Ranked #7 on Semantic Segmentation on FoodSeg103 (using extra training data)

Computational Efficiency Human Parsing +8

Weakly-Supervised Semantic Segmentation Network With Deep Seeded Region Growing

1 code implementation CVPR 2018 Zilong Huang, Xinggang Wang, Jiasi Wang, Wenyu Liu, Jingdong Wang

Inspired by the traditional image segmentation methods of seeded region growing, we propose to train a semantic segmentation network starting from the discriminative regions and progressively increase the pixel-level supervision using by seeded region growing.

Ranked #38 on Weakly-Supervised Semantic Segmentation on COCO 2014 val (using extra training data)

Image Segmentation Segmentation +2

Point Linking Network for Object Detection

no code implementations12 Jun 2017 Xinggang Wang, Kaibing Chen, Zilong Huang, Cong Yao, Wenyu Liu

The deep ConvNets based object detectors mainly focus on regressing the coordinates of bounding box, e. g., Faster-R-CNN, YOLO and SSD.

Object object-detection +1

Deep Patch Learning for Weakly Supervised Object Classification and Discovery

1 code implementation6 May 2017 Peng Tang, Xinggang Wang, Zilong Huang, Xiang Bai, Wenyu Liu

Patch-level image representation is very important for object classification and detection, since it is robust to spatial transformation, scale variation, and cluttered background.

Classification General Classification +3

Cannot find the paper you are looking for? You can Submit a new open access paper.