Search Results for author: Junshi Huang

Found 25 papers, 12 papers with code

Music Consistency Models

no code implementations • 20 Apr 2024 • Zhengcong Fei, Mingyuan Fan, Junshi Huang

Consistency models have exhibited remarkable capabilities in facilitating efficient image/video generation, enabling synthesis with minimal sampling steps.

Computational Efficiency Music Generation +1

Paper
Add Code

Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models

1 code implementation • 6 Apr 2024 • Zhengcong Fei, Mingyuan Fan, Changqian Yu, Debang Li, Junshi Huang

Transformers have catalyzed advancements in computer vision and natural language processing (NLP) fields.

Image Generation Unconditional Image Generation

Paper
Code

Scalable Diffusion Models with State Space Backbone

1 code implementation • 8 Feb 2024 • Zhengcong Fei, Mingyuan Fan, Changqian Yu, Junshi Huang

We endeavor to train diffusion models for image data, wherein the traditional U-Net backbone is supplanted by a state space backbone, functioning on raw patches or latent space.

Conditional Image Generation

124

Paper
Code

Tuning-Free Inversion-Enhanced Control for Consistent Image Editing

no code implementations • 22 Dec 2023 • Xiaoyue Duan, Shuhao Cui, Guoliang Kang, Baochang Zhang, Zhengcong Fei, Mingyuan Fan, Junshi Huang

Consistent editing of real images is a challenging task, as it requires performing non-rigid edits (e. g., changing postures) to the main objects in the input image without changing their identity or attributes.

Denoising

Paper
Add Code

A-JEPA: Joint-Embedding Predictive Architecture Can Listen

no code implementations • 27 Nov 2023 • Zhengcong Fei, Mingyuan Fan, Junshi Huang

The target representations of those regions are extracted by the exponential moving average of context encoder, \emph{i. e.}, target encoder, on the whole spectrogram.

Self-Supervised Learning

Paper
Add Code

Enriching Phrases with Coupled Pixel and Object Contexts for Panoptic Narrative Grounding

no code implementations • 2 Nov 2023 • Tianrui Hui, Zihan Ding, Junshi Huang, Xiaoming Wei, Xiaolin Wei, Jiao Dai, Jizhong Han, Si Liu

Panoptic narrative grounding (PNG) aims to segment things and stuff objects in an image described by noun phrases of a narrative caption.

Decoder Object

Paper
Add Code

DiT: Efficient Vision Transformers with Dynamic Token Routing

1 code implementation • 7 Aug 2023 • Yuchen Ma, Zhengcong Fei, Junshi Huang

The proposed framework generates a data-dependent path per token, adapting to the object scales and visual discrimination of tokens.

Instance Segmentation Object +3

Paper
Code

Divide and Adapt: Active Domain Adaptation via Customized Learning

1 code implementation • CVPR 2023 • Duojun Huang, Jichang Li, Weikai Chen, Junshi Huang, Zhenhua Chai, Guanbin Li

To accommodate active learning and domain adaption, the two naturally different tasks, in a collaborative framework, we advocate that a customized learning strategy for the target data is the key to the success of ADA solutions.

Active Learning Informativeness +3

Paper
Code

Gradient-Free Textual Inversion

no code implementations • 12 Apr 2023 • Zhengcong Fei, Mingyuan Fan, Junshi Huang

Recent works on personalized text-to-image generation usually learn to bind a special token with specific subjects or styles of a few given images by tuning its embedding through gradient descent.

Computational Efficiency Dimensionality Reduction +1

Paper
Add Code

EfficientRep:An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network Design

1 code implementation • 1 Feb 2023 • Kaiheng Weng, Xiangxiang Chu, Xiaoming Xu, Junshi Huang, Xiaoming Wei

Thus, how to design a neural network to efficiently use the computing ability and memory bandwidth of hardware is a critical problem.

object-detection Object Detection

5,554

Paper
Code

Masked Auto-Encoders Meet Generative Adversarial Networks and Beyond

1 code implementation • CVPR 2023 • Zhengcong Fei, Mingyuan Fan, Li Zhu, Junshi Huang, Xiaoming Wei, Xiaolin Wei

In this paper, we introduce a novel Generative Adversarial Networks alike framework, referred to as GAN-MAE, where a generator is used to generate the masked patches according to the remaining visible patches, and a discriminator is employed to predict whether the patch is synthesized by the generator.

Representation Learning

Paper
Code

Bridging Search Region Interaction With Template for RGB-T Tracking

1 code implementation • CVPR 2023 • Tianrui Hui, Zizheng Xun, Fengguang Peng, Junshi Huang, Xiaoming Wei, Xiaolin Wei, Jiao Dai, Jizhong Han, Si Liu

To alleviate these limitations, we propose a novel Template-Bridged Search region Interaction (TBSI) module which exploits templates as the medium to bridge the cross-modal interaction between RGB and TIR search regions by gathering and distributing target-relevant object and environment contexts.

Ranked #4 on Rgb-T Tracking on RGBT210

Rgb-T Tracking Template Matching

Paper
Code

Uncertainty-Aware Image Captioning

no code implementations • 30 Nov 2022 • Zhengcong Fei, Mingyuan Fan, Li Zhu, Junshi Huang, Xiaoming Wei, Xiaolin Wei

It is well believed that the higher uncertainty in a word of the caption, the more inter-correlated context information is required to determine it.

Caption Generation Image Captioning +1

Paper
Add Code

Progressive Text-to-Image Generation

no code implementations • 5 Oct 2022 • Zhengcong Fei, Mingyuan Fan, Li Zhu, Junshi Huang

Recently, Vector Quantized AutoRegressive (VQ-AR) models have shown remarkable results in text-to-image synthesis by equally predicting discrete image tokens from the top left to bottom right in the latent space.

Denoising Text-to-Image Generation

Paper
Add Code

Meta-Ensemble Parameter Learning

no code implementations • 5 Oct 2022 • Zhengcong Fei, Shuman Tian, Junshi Huang, Xiaoming Wei, Xiaolin Wei

Knowledge distillation is an approach that allows a single model to efficiently capture the approximate performance of an ensemble while showing poor scalability as demand for re-training when introducing new teacher models.

Knowledge Distillation Meta-Learning

Paper
Add Code

PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding

1 code implementation • 11 Aug 2022 • Zihan Ding, Zi-han Ding, Tianrui Hui, Junshi Huang, Xiaoming Wei, Xiaolin Wei, Si Liu

To alleviate these drawbacks, we propose a one-stage end-to-end Pixel-Phrase Matching Network (PPMN), which directly matches each phrase to its corresponding pixels instead of region proposals and outputs panoptic segmentation by simple combination.

Panoptic Segmentation Segmentation +1

Paper
Code

Efficient Modeling of Future Context for Image Captioning

1 code implementation • 22 Jul 2022 • Zhengcong Fei, Junshi Huang, Xiaoming Wei, Xiaolin Wei

Existing approaches to image captioning usually generate the sentence word-by-word from left to right, with the constraint of conditioned on local context including the given image and history generated words.

Image Captioning Sentence +1

Paper
Code

Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation

1 code implementation • CVPR 2022 • Zihan Ding, Tianrui Hui, Junshi Huang, Xiaoming Wei, Jizhong Han, Si Liu

Referring video object segmentation aims to predict foreground labels for objects referred by natural language expressions in videos.

Ranked #6 on Referring Video Object Segmentation on MeViS

Denoising Referring Video Object Segmentation +2

Paper
Code

Embedded Discriminative Attention Mechanism for Weakly Supervised Semantic Segmentation

1 code implementation • CVPR 2021 • Tong Wu, Junshi Huang, Guangyu Gao, Xiaoming Wei, Xiaolin Wei, Xuan Luo, Chi Harold Liu

In inference, we directly use the activation masks from the DA layer as pseudo-labels for segmentation.

Segmentation Weakly supervised Semantic Segmentation +1

Paper
Code

Rethinking BiSeNet For Real-time Semantic Segmentation

6 code implementations • CVPR 2021 • Mingyuan Fan, Shenqi Lai, Junshi Huang, Xiaoming Wei, Zhenhua Chai, Junfeng Luo, Xiaolin Wei

BiSeNet has been proved to be a popular two-stream network for real-time segmentation.

Ranked #8 on Real-Time Semantic Segmentation on Cityscapes test

Decoder Dichotomous Image Segmentation +4

8,288

Paper
Code

More is Less: A More Complicated Network with Less Inference Complexity

no code implementations • CVPR 2017 • Xuanyi Dong, Junshi Huang, Yi Yang, Shuicheng Yan

In this paper, we present a novel and general network structure towards accelerating the inference process of convolutional neural networks, which is more complicated in network structure yet with less inference complexity.

Paper
Add Code

Deep Domain Adaptation for Describing People Based on Fine-Grained Clothing Attributes

no code implementations • CVPR 2015 • Qiang Chen, Junshi Huang, Rogerio Feris, Lisa M. Brown, Jian Dong, Shuicheng Yan

We address the problem of describing people based on fine-grained clothing attributes.

Attribute Domain Adaptation

Paper
Add Code

Cross-domain Image Retrieval with a Dual Attribute-aware Ranking Network

no code implementations • ICCV 2015 • Junshi Huang, Rogerio S. Feris, Qiang Chen, Shuicheng Yan

To address this problem, we propose a Dual Attribute-aware Ranking Network (DARN) for retrieval feature learning.

Attribute Image Retrieval +2

Paper
Add Code

CNN: Single-label to Multi-label

no code implementations • 22 Jun 2014 • Yunchao Wei, Wei Xia, Junshi Huang, Bingbing Ni, Jian Dong, Yao Zhao, Shuicheng Yan

Convolutional Neural Network (CNN) has demonstrated promising performance in single-label image classification tasks.

Image Classification

Paper
Add Code

Towards Multi-view and Partially-Occluded Face Alignment

no code implementations • CVPR 2014 • Junliang Xing, Zhiheng Niu, Junshi Huang, Weiming Hu, Shuicheng Yan

During each training stage, the SRD model learns a relational dictionary to capture consistent relationships between face appearance and shape, which are respectively modeled by the pose-indexed image features and the shape displacements for current estimated landmarks.

Face Alignment

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.