Search Results for author: Zilong Huang

Found 28 papers, 18 papers with code

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

3 code implementations • 19 Jan 2024 • Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, Hengshuang Zhao

To this end, we scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data (~62M), which significantly enlarges the data coverage and thus is able to reduce the generalization error.

Ranked #3 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)

Data Augmentation Monocular Depth Estimation +1

5,683

Paper
Code

Harnessing Diffusion Models for Visual Perception with Meta Prompts

1 code implementation • 22 Dec 2023 • Qiang Wan, Zilong Huang, Bingyi Kang, Jiashi Feng, Li Zhang

Our key insight is to introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.

Ranked #2 on Semantic Segmentation on Cityscapes test (using extra training data)

Monocular Depth Estimation Pose Estimation +1

Paper
Code

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

1 code implementation • 17 Jul 2023 • Yang Zhao, Zhijie Lin, Daquan Zhou, Zilong Huang, Jiashi Feng, Bingyi Kang

Our experiments show that BuboGPT achieves impressive multi-modality understanding and visual grounding abilities during the interaction with human.

Instruction Following Sentence +1

470

Paper
Code

Disentangled Pre-training for Image Matting

1 code implementation • 3 Apr 2023 • Yanda Li, Zilong Huang, Gang Yu, Ling Chen, Yunchao Wei, Jianbo Jiao

The pre-training task is designed in a similar manner as image matting, where random trimap and alpha matte are generated to achieve an image disentanglement objective.

Disentanglement Image Matting

Paper
Code

SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation

1 code implementation • 30 Jan 2023 • Qiang Wan, Zilong Huang, Jiachen Lu, Gang Yu, Li Zhang

Coupled with a light segmentation head, we achieve the best trade-off between segmentation accuracy and latency on the ARM-based mobile devices on the ADE20K and Cityscapes datasets.

Image Classification Segmentation +1

239

Paper
Code

Executing your Commands via Motion Diffusion in Latent Space

1 code implementation • CVPR 2023 • Xin Chen, Biao Jiang, Wen Liu, Zilong Huang, Bin Fu, Tao Chen, Jingyi Yu, Gang Yu

We study a challenging task, conditional human motion generation, which produces plausible human motion sequences according to various conditional inputs, such as action classes or textual descriptors.

Ranked #2 on Motion Synthesis on HumanAct12

Motion Synthesis

509

Paper
Code

Efficient Single-Image Depth Estimation on Mobile Devices, Mobile AI & AIM 2022 Challenge: Report

no code implementations • 7 Nov 2022 • Andrey Ignatov, Grigory Malivenko, Radu Timofte, Lukasz Treszczotko, Xin Chang, Piotr Ksiazek, Michal Lopuszynski, Maciej Pioro, Rafal Rudnicki, Maciej Smyl, Yujie Ma, Zhenyu Li, Zehui Chen, Jialei Xu, Xianming Liu, Junjun Jiang, XueChao Shi, Difan Xu, Yanan Li, Xiaotao Wang, Lei Lei, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo, Gang Yu, Bin Fu, Jiaqi Li, Yiran Wang, Zihao Huang, Zhiguo Cao, Marcos V. Conde, Denis Sapozhnikov, Byeong Hyun Lee, Dongwon Park, Seongmin Hong, Joonhee Lee, Seunggyu Lee, Se Young Chun

Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks.

Bokeh Effect Rendering Depth Estimation +3

Paper
Add Code

Coordinates Are NOT Lonely -- Codebook Prior Helps Implicit Neural 3D Representations

1 code implementation • 20 Oct 2022 • Fukun Yin, Wen Liu, Zilong Huang, Pei Cheng, Tao Chen, Gang Yu

Implicit neural 3D representation has achieved impressive results in surface or scene reconstruction and novel view synthesis, which typically uses the coordinate-based multi-layer perceptrons (MLPs) to learn a continuous scene representation.

Novel View Synthesis

Paper
Code

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

3 code implementations • CVPR 2022 • Wenqiang Zhang, Zilong Huang, Guozhong Luo, Tao Chen, Xinggang Wang, Wenyu Liu, Gang Yu, Chunhua Shen

Although vision transformers (ViTs) have achieved great success in computer vision, the heavy computational cost hampers their applications to dense prediction tasks such as semantic segmentation on mobile devices.

Segmentation Semantic Segmentation

373

Paper
Code

SeMask: Semantically Masked Transformers for Semantic Segmentation

1 code implementation • arXiv 2021 • Jitesh Jain, Anukriti Singh, Nikita Orlov, Zilong Huang, Jiachen Li, Steven Walton, Humphrey Shi

To achieve this, we propose SeMask, a simple and effective framework that incorporates semantic information into the encoder with the help of a semantic attention operation.

Ranked #10 on Semantic Segmentation on Cityscapes val

Semantic Segmentation

243

Paper
Code

Shuffle Transformer with Feature Alignment for Video Face Parsing

no code implementations • 16 Jun 2021 • Rui Zhang, Yang Han, Zilong Huang, Pei Cheng, Guozhong Luo, Gang Yu, Bin Fu

This is a short technical report introducing the solution of the Team TCParser for Short-video Face Parsing Track of The 3rd Person in Context (PIC) Workshop and Challenge at CVPR 2021.

Face Parsing

Paper
Add Code

Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer

4 code implementations • 7 Jun 2021 • Zilong Huang, Youcheng Ben, Guozhong Luo, Pei Cheng, Gang Yu, Bin Fu

In this work, we revisit the spatial shuffle as an efficient way to build connections among windows.

Ranked #45 on Semantic Segmentation on ADE20K val

Image Classification object-detection +3

1,681

Paper
Code

Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report

no code implementations • 17 May 2021 • Andrey Ignatov, Grigory Malivenko, David Plowman, Samarth Shukla, Radu Timofte, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo, Gang Yu, Bin Fu, Yiran Wang, Xingyi Li, Min Shi, Ke Xian, Zhiguo Cao, Jin-Hua Du, Pei-Lin Wu, Chao Ge, Jiaoyang Yao, Fangwen Tu, Bo Li, Jung Eun Yoo, Kwanggyoon Seo, Jialei Xu, Zhenyu Li, Xianming Liu, Junjun Jiang, Wei-Chi Chen, Shayan Joya, Huanhuan Fan, Zhaobing Kang, Ang Li, Tianpeng Feng, Yang Liu, Chuannan Sheng, Jian Yin, Fausto T. Benavide

While many solutions have been proposed for this task, they are usually very computationally expensive and thus are not applicable for on-device inference.

Depth Estimation

Paper
Add Code

Half-Real Half-Fake Distillation for Class-Incremental Semantic Segmentation

no code implementations • 2 Apr 2021 • Zilong Huang, Wentian Hao, Xinggang Wang, Mingyuan Tao, Jianqiang Huang, Wenyu Liu, Xian-Sheng Hua

Despite their success for semantic segmentation, convolutional neural networks are ill-equipped for incremental learning, \ie, adapting the original segmentation model as new classes are available but the initial training data is not retained.

Class-Incremental Semantic Segmentation Incremental Learning +1

Paper
Add Code

Human De-occlusion: Invisible Perception and Recovery for Humans

no code implementations • CVPR 2021 • Qiang Zhou, Shiyin Wang, Yitong Wang, Zilong Huang, Xinggang Wang

Besides, an Amodal Human Perception dataset (AHP) is collected to settle the task of human de-occlusion.

Human Parsing Instance Segmentation +1

Paper
Add Code

High-Resolution Deep Image Matting

no code implementations • 14 Sep 2020 • Haichao Yu, Ning Xu, Zilong Huang, Yuqian Zhou, Humphrey Shi

Image matting is a key technique for image and video editing and composition.

Image Matting Video Editing +1

Paper
Add Code

Deep Learning-Based Automated Image Segmentation for Concrete Petrographic Analysis

no code implementations • 21 May 2020 • Yu Song, Zilong Huang, Chuanyue Shen, Humphrey Shi, David A Lange

The standard petrography test method for measuring air voids in concrete (ASTM C457) requires a meticulous and long examination of sample phase composition under a stereomicroscope.

Image Segmentation Segmentation +1

Paper
Add Code

The 1st Agriculture-Vision Challenge: Methods and Results

1 code implementation • 21 Apr 2020 • Mang Tik Chiu, Xingqian Xu, Kai Wang, Jennifer Hobbs, Naira Hovakimyan, Thomas S. Huang, Honghui Shi, Yunchao Wei, Zilong Huang, Alexander Schwing, Robert Brunner, Ivan Dozier, Wyatt Dozier, Karen Ghandilyan, David Wilson, Hyunseong Park, Junhee Kim, Sungho Kim, Qinghui Liu, Michael C. Kampffmeyer, Robert Jenssen, Arnt B. Salberg, Alexandre Barbosa, Rodrigo Trevisan, Bingchen Zhao, Shaozuo Yu, Siwei Yang, Yin Wang, Hao Sheng, Xiao Chen, Jingyi Su, Ram Rajagopal, Andrew Ng, Van Thong Huynh, Soo-Hyung Kim, In-Seop Na, Ujjwal Baid, Shubham Innani, Prasad Dutande, Bhakti Baheti, Sanjay Talbar, Jianyu Tang

The first Agriculture-Vision Challenge aims to encourage research in developing novel and effective algorithms for agricultural pattern recognition from aerial images, especially for the semantic segmentation task associated with our challenge dataset.

Segmentation Semantic Segmentation

Paper
Code

AlignSeg: Feature-Aligned Segmentation Networks

1 code implementation • 24 Feb 2020 • Zilong Huang, Yunchao Wei, Xinggang Wang, Wenyu Liu, Thomas S. Huang, Humphrey Shi

Aggregating features in terms of different convolutional blocks or contextual embeddings has been proven to be an effective way to strengthen feature representations for semantic segmentation.

Segmentation Semantic Segmentation

124

Paper
Code

Agriculture-Vision: A Large Aerial Image Database for Agricultural Pattern Analysis

2 code implementations • CVPR 2020 • Mang Tik Chiu, Xingqian Xu, Yunchao Wei, Zilong Huang, Alexander Schwing, Robert Brunner, Hrant Khachatrian, Hovnatan Karapetyan, Ivan Dozier, Greg Rose, David Wilson, Adrian Tudor, Naira Hovakimyan, Thomas S. Huang, Honghui Shi

To encourage research in computer vision for agriculture, we present Agriculture-Vision: a large-scale aerial farmland image dataset for semantic segmentation of agricultural patterns.

Segmentation Semantic Segmentation

Paper
Code

SPGNet: Semantic Prediction Guidance for Scene Parsing

no code implementations • ICCV 2019 • Bowen Cheng, Liang-Chieh Chen, Yunchao Wei, Yukun Zhu, Zilong Huang, JinJun Xiong, Thomas Huang, Wen-mei Hwu, Honghui Shi

The multi-scale context module refers to the operations to aggregate feature responses from a large spatial extent, while the single-stage encoder-decoder structure encodes the high-level semantic information in the encoder path and recovers the boundary information in the decoder path.

Pose Estimation Scene Parsing +2

Paper
Add Code

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

1 code implementation • 2 Jul 2019 • Qiang Zhou, Zilong Huang, Lichao Huang, Yongchao Gong, Han Shen, Chang Huang, Wenyu Liu, Xinggang Wang

Video object segmentation (VOS) aims at pixel-level object tracking given only the annotations in the first frame.

Ranked #1 on Visual Object Tracking on YouTube-VOS 2018 (Jaccard (Seen) metric)

Object Object Tracking +4

117

Paper
Code

CCNet: Criss-Cross Attention for Semantic Segmentation

4 code implementations • ICCV 2019 • Zilong Huang, Xinggang Wang, Yunchao Wei, Lichao Huang, Humphrey Shi, Wenyu Liu, Thomas S. Huang

Compared with the non-local block, the proposed recurrent criss-cross attention module requires 11x less GPU memory usage.

Ranked #7 on Semantic Segmentation on FoodSeg103 (using extra training data)

Computational Efficiency Human Parsing +8

7,409

Paper
Code

Devil in the Details: Towards Accurate Single and Multiple Human Parsing

2 code implementations • 17 Sep 2018 • Tao Ruan, Ting Liu, Zilong Huang, Yunchao Wei, Shikui Wei, Yao Zhao, Thomas Huang

Human parsing has received considerable interest due to its wide application potentials.

Ranked #2 on Person Re-Identification on Market-1501-C

Human Parsing Person Re-Identification +1

212

Paper
Code

Weakly-Supervised Semantic Segmentation Network With Deep Seeded Region Growing

1 code implementation • CVPR 2018 • Zilong Huang, Xinggang Wang, Jiasi Wang, Wenyu Liu, Jingdong Wang

Inspired by the traditional image segmentation methods of seeded region growing, we propose to train a semantic segmentation network starting from the discriminative regions and progressively increase the pixel-level supervision using by seeded region growing.

Ranked #38 on Weakly-Supervised Semantic Segmentation on COCO 2014 val (using extra training data)

Image Segmentation Segmentation +2

249

Paper
Code

Object-Level Proposals

no code implementations • ICCV 2017 • Jianxiang Ma, Anlong Ming, Zilong Huang, Xinggang Wang, Yu Zhou

Edge and surface are two fundamental visual elements of an object.

Object object-detection +1

Paper
Add Code

Point Linking Network for Object Detection

no code implementations • 12 Jun 2017 • Xinggang Wang, Kaibing Chen, Zilong Huang, Cong Yao, Wenyu Liu

The deep ConvNets based object detectors mainly focus on regressing the coordinates of bounding box, e. g., Faster-R-CNN, YOLO and SSD.

Object object-detection +1

Paper
Add Code

Deep Patch Learning for Weakly Supervised Object Classification and Discovery

1 code implementation • 6 May 2017 • Peng Tang, Xinggang Wang, Zilong Huang, Xiang Bai, Wenyu Liu

Patch-level image representation is very important for object classification and detection, since it is robust to spatial transformation, scale variation, and cluttered background.

Classification General Classification +3

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.