no code implementations • 20 Apr 2024 • Zhengcong Fei, Mingyuan Fan, Junshi Huang
Consistency models have exhibited remarkable capabilities in facilitating efficient image/video generation, enabling synthesis with minimal sampling steps.
1 code implementation • 6 Apr 2024 • Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Junshi Huang
Transformers have catalyzed advancements in computer vision and natural language processing (NLP) fields.
1 code implementation • 8 Feb 2024 • Zhengcong Fei, Mingyuan Fan, Changqian Yu, Junshi Huang
We endeavor to train diffusion models for image data, wherein the traditional U-Net backbone is supplanted by a state space backbone, functioning on raw patches or latent space.
no code implementations • 22 Dec 2023 • Xiaoyue Duan, Shuhao Cui, Guoliang Kang, Baochang Zhang, Zhengcong Fei, Mingyuan Fan, Junshi Huang
Consistent editing of real images is a challenging task, as it requires performing non-rigid edits (e. g., changing postures) to the main objects in the input image without changing their identity or attributes.
no code implementations • 27 Nov 2023 • Zhengcong Fei, Mingyuan Fan, Junshi Huang
The target representations of those regions are extracted by the exponential moving average of context encoder, \emph{i. e.}, target encoder, on the whole spectrogram.
no code implementations • 2 Nov 2023 • Tianrui Hui, Zihan Ding, Junshi Huang, Xiaoming Wei, Xiaolin Wei, Jiao Dai, Jizhong Han, Si Liu
Panoptic narrative grounding (PNG) aims to segment things and stuff objects in an image described by noun phrases of a narrative caption.
1 code implementation • 7 Aug 2023 • Yuchen Ma, Zhengcong Fei, Junshi Huang
The proposed framework generates a data-dependent path per token, adapting to the object scales and visual discrimination of tokens.
1 code implementation • CVPR 2023 • Duojun Huang, Jichang Li, Weikai Chen, Junshi Huang, Zhenhua Chai, Guanbin Li
To accommodate active learning and domain adaption, the two naturally different tasks, in a collaborative framework, we advocate that a customized learning strategy for the target data is the key to the success of ADA solutions.
no code implementations • 12 Apr 2023 • Zhengcong Fei, Mingyuan Fan, Junshi Huang
Recent works on personalized text-to-image generation usually learn to bind a special token with specific subjects or styles of a few given images by tuning its embedding through gradient descent.
1 code implementation • 1 Feb 2023 • Kaiheng Weng, Xiangxiang Chu, Xiaoming Xu, Junshi Huang, Xiaoming Wei
Thus, how to design a neural network to efficiently use the computing ability and memory bandwidth of hardware is a critical problem.
1 code implementation • CVPR 2023 • Zhengcong Fei, Mingyuan Fan, Li Zhu, Junshi Huang, Xiaoming Wei, Xiaolin Wei
In this paper, we introduce a novel Generative Adversarial Networks alike framework, referred to as GAN-MAE, where a generator is used to generate the masked patches according to the remaining visible patches, and a discriminator is employed to predict whether the patch is synthesized by the generator.
1 code implementation • CVPR 2023 • Tianrui Hui, Zizheng Xun, Fengguang Peng, Junshi Huang, Xiaoming Wei, Xiaolin Wei, Jiao Dai, Jizhong Han, Si Liu
To alleviate these limitations, we propose a novel Template-Bridged Search region Interaction (TBSI) module which exploits templates as the medium to bridge the cross-modal interaction between RGB and TIR search regions by gathering and distributing target-relevant object and environment contexts.
Ranked #4 on Rgb-T Tracking on RGBT210
no code implementations • 30 Nov 2022 • Zhengcong Fei, Mingyuan Fan, Li Zhu, Junshi Huang, Xiaoming Wei, Xiaolin Wei
It is well believed that the higher uncertainty in a word of the caption, the more inter-correlated context information is required to determine it.
no code implementations • 5 Oct 2022 • Zhengcong Fei, Mingyuan Fan, Li Zhu, Junshi Huang
Recently, Vector Quantized AutoRegressive (VQ-AR) models have shown remarkable results in text-to-image synthesis by equally predicting discrete image tokens from the top left to bottom right in the latent space.
no code implementations • 5 Oct 2022 • Zhengcong Fei, Shuman Tian, Junshi Huang, Xiaoming Wei, Xiaolin Wei
Knowledge distillation is an approach that allows a single model to efficiently capture the approximate performance of an ensemble while showing poor scalability as demand for re-training when introducing new teacher models.
1 code implementation • 11 Aug 2022 • Zihan Ding, Zi-han Ding, Tianrui Hui, Junshi Huang, Xiaoming Wei, Xiaolin Wei, Si Liu
To alleviate these drawbacks, we propose a one-stage end-to-end Pixel-Phrase Matching Network (PPMN), which directly matches each phrase to its corresponding pixels instead of region proposals and outputs panoptic segmentation by simple combination.
1 code implementation • 22 Jul 2022 • Zhengcong Fei, Junshi Huang, Xiaoming Wei, Xiaolin Wei
Existing approaches to image captioning usually generate the sentence word-by-word from left to right, with the constraint of conditioned on local context including the given image and history generated words.
1 code implementation • CVPR 2022 • Zihan Ding, Tianrui Hui, Junshi Huang, Xiaoming Wei, Jizhong Han, Si Liu
Referring video object segmentation aims to predict foreground labels for objects referred by natural language expressions in videos.
Ranked #6 on Referring Video Object Segmentation on MeViS
1 code implementation • CVPR 2021 • Tong Wu, Junshi Huang, Guangyu Gao, Xiaoming Wei, Xiaolin Wei, Xuan Luo, Chi Harold Liu
In inference, we directly use the activation masks from the DA layer as pseudo-labels for segmentation.
6 code implementations • CVPR 2021 • Mingyuan Fan, Shenqi Lai, Junshi Huang, Xiaoming Wei, Zhenhua Chai, Junfeng Luo, Xiaolin Wei
BiSeNet has been proved to be a popular two-stream network for real-time segmentation.
Ranked #8 on Real-Time Semantic Segmentation on Cityscapes test
no code implementations • CVPR 2017 • Xuanyi Dong, Junshi Huang, Yi Yang, Shuicheng Yan
In this paper, we present a novel and general network structure towards accelerating the inference process of convolutional neural networks, which is more complicated in network structure yet with less inference complexity.
no code implementations • CVPR 2015 • Qiang Chen, Junshi Huang, Rogerio Feris, Lisa M. Brown, Jian Dong, Shuicheng Yan
We address the problem of describing people based on fine-grained clothing attributes.
no code implementations • ICCV 2015 • Junshi Huang, Rogerio S. Feris, Qiang Chen, Shuicheng Yan
To address this problem, we propose a Dual Attribute-aware Ranking Network (DARN) for retrieval feature learning.
no code implementations • 22 Jun 2014 • Yunchao Wei, Wei Xia, Junshi Huang, Bingbing Ni, Jian Dong, Yao Zhao, Shuicheng Yan
Convolutional Neural Network (CNN) has demonstrated promising performance in single-label image classification tasks.
no code implementations • CVPR 2014 • Junliang Xing, Zhiheng Niu, Junshi Huang, Weiming Hu, Shuicheng Yan
During each training stage, the SRD model learns a relational dictionary to capture consistent relationships between face appearance and shape, which are respectively modeled by the pose-indexed image features and the shape displacements for current estimated landmarks.