1 code implementation • 15 Apr 2024 • Haoxing Chen, Yaohui Li, Zizheng Huang, Yan Hong, Zhuoer Xu, Zhangxuan Gu, Jun Lan, Huijia Zhu, Weiqiang Wang
Recent advancements in efficient transfer learning (ETL) have shown remarkable success in fine-tuning VLMs within the scenario of limited data, introducing only a few parameters to harness task-specific insights from VLMs.
no code implementations • 20 Dec 2023 • Haoxing Chen, Yaohui Li, Zhangxuan Gu, Zhuoer Xu, Jun Lan, Huaxiong Li
Image harmonization is a crucial technique in image composition that aims to seamlessly match the background by adjusting the foreground of composite images.
1 code implementation • 21 Nov 2023 • Haoxing Chen, Yaohui Li, Yan Hong, Zizheng Huang, Zhuoer Xu, Zhangxuan Gu, Jun Lan, Huijia Zhu, Weiqiang Wang
Recent methods mainly focus on learning multi-modal features aligned with class names to enhance the generalization ability to unseen categories.
Ranked #1 on GZSL Video Classification on ActivityNet-GZSL (cls)
1 code implementation • 19 Aug 2023 • Bo Zhang, Yuxuan Duan, Jun Lan, Yan Hong, Huijia Zhu, Weiqiang Wang, Li Niu
To address these challenges, we propose a controllable image composition method that unifies four tasks in one diffusion model: image blending, image harmonization, view synthesis, and generative composition.
no code implementations • 19 Aug 2023 • Qunsong Zeng, Jiawei Liu, Mingrui Jiang, Jun Lan, Yi Gong, Zhongrui Wang, Yida Li, Can Li, Jim Ignowski, Kaibin Huang
To support emerging applications ranging from holographic communications to extended reality, next-generation mobile wireless communication systems require ultra-fast and energy-efficient baseband processors.
1 code implementation • NeurIPS 2023 • Haoxing Chen, Zhuoer Xu, Zhangxuan Gu, Jun Lan, Xing Zheng, Yaohui Li, Changhua Meng, Huijia Zhu, Weiqiang Wang
Specifically, we build our model on a diffusion model and carefully modify the network structure to enable the model for drawing multilingual characters with the help of glyph and position information.
1 code implementation • CVPR 2023 • Zhangxuan Gu, Zhuoer Xu, Haoxing Chen, Jun Lan, Changhua Meng, Weiqiang Wang
Recent object detection approaches rely on pretrained vision-language models for image-text alignment.
2 code implementations • 6 Dec 2022 • Zhangxuan Gu, Haoxing Chen, Zhuoer Xu, Jun Lan, Changhua Meng, Weiqiang Wang
Extensive experimental results on COCO and LVIS show that DiffusionInst achieves competitive performance compared to existing instance segmentation models with various backbones, such as ResNet and Swin Transformers.
Ranked #8 on Instance Segmentation on LVIS v1.0 val
1 code implementation • 16 Nov 2022 • Haoxing Chen, Zhangxuan Gu, Yaohui Li, Jun Lan, Changhua Meng, Weiqiang Wang, Huaxiong Li
The MGD effectively applies distinct convolution to the foreground and background, learning the representations of foreground and background regions as well as their correlations to the global harmonization, facilitating local visual consistency for the images much more efficiently.
Ranked #2 on Image Harmonization on HAdobe5k(1024$\times$1024)
no code implementations • 7 May 2022 • Qunsong Zeng, Jiawei Liu, Jun Lan, Yi Gong, Zhongrui Wang, Yida Li, Kaibin Huang
To support emerging applications ranging from holographic communications to extended reality, next-generation mobile wireless communication systems require ultra-fast and energy-efficient (UFEE) baseband processors.
1 code implementation • CVPR 2022 • Zhangxuan Gu, Changhua Meng, Ke Wang, Jun Lan, Weiqiang Wang, Ming Gu, Liqing Zhang
Recently, various multimodal networks for Visually-Rich Document Understanding(VRDU) have been proposed, showing the promotion of transformers by integrating visual and layout information with the text embeddings.
document understanding Optical Character Recognition (OCR) +1