Search Results for author: Yuhui Yuan

Found 34 papers, 22 papers with code

DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing

no code implementations • 21 Mar 2024 • Yueru Jia, Yuhui Yuan, Aosong Cheng, Chuke Wang, Ji Li, Huizhu Jia, Shanghang Zhang

Second, we propose an instruction-guided latent fusion that pastes the multi-layered latent representations onto a canvas latent.

Text-to-Image Generation

Paper
Add Code

Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

no code implementations • 14 Mar 2024 • Zeyu Liu, Weicong Liang, Zhanhao Liang, Chong Luo, Ji Li, Gao Huang, Yuhui Yuan

Visual text rendering poses a fundamental challenge for contemporary text-to-image generation models, with the core problem lying in text encoder deficiencies.

Text-to-Image Generation

Paper
Add Code

Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior

no code implementations • 15 Dec 2023 • Nan Huang, Ting Zhang, Yuhui Yuan, Dong Chen, Shanghang Zhang

In this paper, we present a novel two-stage approach that fully utilizes the information provided by the reference image to establish a customized knowledge prior for image-to-3D generation.

3D Generation Image to 3D +1

Paper
Add Code

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

no code implementations • 30 Nov 2023 • Yanhui Wang, Jianmin Bao, Wenming Weng, Ruoyu Feng, Dacheng Yin, Tao Yang, Jingxu Zhang, Qi Dai Zhiyuan Zhao, Chunyu Wang, Kai Qiu, Yuhui Yuan, Chuanxin Tang, Xiaoyan Sun, Chong Luo, Baining Guo

We present MicroCinema, a straightforward yet effective framework for high-quality and coherent text-to-video generation.

Text-to-Image Generation Text-to-Video Generation +1

Paper
Add Code

ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models

no code implementations • 30 Nov 2023 • Wenming Weng, Ruoyu Feng, Yanhui Wang, Qi Dai, Chunyu Wang, Dacheng Yin, Zhiyuan Zhao, Kai Qiu, Jianmin Bao, Yuhui Yuan, Chong Luo, Yueyi Zhang, Zhiwei Xiong

Second, it preserves the high-fidelity generation ability of the pre-trained image diffusion models by making only minimal network modifications.

Text-to-Video Generation Video Generation

Paper
Add Code

COLE: A Hierarchical Generation Framework for Multi-Layered and Editable Graphic Design

no code implementations • 28 Nov 2023 • Peidong Jia, Chenxuan Li, Yuhui Yuan, Zeyu Liu, Yichao Shen, Bohan Chen, Xingru Chen, Yinglin Zheng, Dong Chen, Ji Li, Xiaodong Xie, Shanghang Zhang, Baining Guo

Our COLE system comprises multiple fine-tuned Large Language Models (LLMs), Large Multimodal Models (LMMs), and Diffusion Models (DMs), each specifically tailored for design-aware layer-wise captioning, layout planning, reasoning, and the task of generating images and text.

Image Generation

Paper
Add Code

CCEdit: Creative and Controllable Video Editing via Diffusion Models

no code implementations • 28 Sep 2023 • Ruoyu Feng, Wenming Weng, Yanhui Wang, Yuhui Yuan, Jianmin Bao, Chong Luo, Zhibo Chen, Baining Guo

The versatility of our framework is demonstrated through a diverse range of choices in both structure representations and personalized T2I models, as well as the option to provide the edited key frame.

Text-to-Image Generation Video Editing

Paper
Add Code

Mask-Attention-Free Transformer for 3D Instance Segmentation

1 code implementation • ICCV 2023 • Xin Lai, Yuhui Yuan, Ruihang Chu, Yukang Chen, Han Hu, Jiaya Jia

Therefore, we abandon the mask attention design and resort to an auxiliary center regression task instead.

3D Instance Segmentation Position +2

Paper
Code

Exploring Predicate Visual Context in Detecting Human-Object Interactions

1 code implementation • ICCV 2023 • Frederic Z. Zhang, Yuhui Yuan, Dylan Campbell, Zhuoyao Zhong, Stephen Gould

Recently, the DETR framework has emerged as the dominant approach for human--object interaction (HOI) research.

Ranked #2 on Human-Object Interaction Detection on HICO-DET

Human-Object Interaction Detection Object

Paper
Code

V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection

1 code implementation • 8 Aug 2023 • Yichao Shen, Zigang Geng, Yuhui Yuan, Yutong Lin, Ze Liu, Chunyu Wang, Han Hu, Nanning Zheng, Baining Guo

We introduce a highly performant 3D object detector for point clouds using the DETR framework.

Ranked #2 on 3D Object Detection on ScanNetV2

3D Object Detection object-detection +1

Paper
Code

Mask Frozen-DETR: High Quality Instance Segmentation with One GPU

no code implementations • 7 Aug 2023 • Zhanhao Liang, Yuhui Yuan

In this paper, we aim to study how to build a strong instance segmenter with minimal training time and GPUs, as opposed to the majority of current approaches that pursue more accurate instance segmenter by building more advanced frameworks at the cost of longer training time and higher GPU requirements.

Ranked #3 on Instance Segmentation on COCO minival (using extra training data)

Instance Segmentation object-detection +2

Paper
Add Code

DETR Doesn't Need Multi-Scale or Locality Design

1 code implementation • 3 Aug 2023 • Yutong Lin, Yuhui Yuan, Zheng Zhang, Chen Li, Nanning Zheng, Han Hu

This paper presents an improved DETR detector that maintains a "plain" nature: using a single-scale feature map and global cross-attention calculations without specific locality constraints, in contrast to previous leading DETR-based detectors that reintroduce architectural inductive biases of multi-scale and locality into the decoder.

182

Paper
Code

Revisiting DETR Pre-training for Object Detection

no code implementations • 2 Aug 2023 • Yan Ma, Weicong Liang, Bohan Chen, Yiduo Hao, BoJian Hou, Xiangyu Yue, Chao Zhang, Yuhui Yuan

Motivated by the remarkable achievements of DETR-based approaches on COCO object detection and segmentation benchmarks, recent endeavors have been directed towards elevating their performance through self-supervised pre-training of Transformers while preserving a frozen backbone.

Object object-detection +1

Paper
Add Code

LISA: Reasoning Segmentation via Large Language Model

2 code implementations • 1 Aug 2023 • Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, Jiaya Jia

In this work, we propose a new segmentation task -- reasoning segmentation.

Language Modelling Large Language Model +3

1,455

Paper
Code

Space Engage: Collaborative Space Supervision for Contrastive-based Semi-Supervised Semantic Segmentation

no code implementations • ICCV 2023 • Changqi Wang, Haoyu Xie, Yuhui Yuan, Chong Fu, Xiangyu Yue

To improve the robustness of representations, powerful methods introduce a pixel-wise contrastive learning approach in latent space (i. e., representation space) that aggregates the representations to their prototypes in a fully supervised manner.

Contrastive Learning Semi-Supervised Semantic Segmentation

Paper
Add Code

detrex: Benchmarking Detection Transformers

1 code implementation • 12 Jun 2023 • Tianhe Ren, Shilong Liu, Feng Li, Hao Zhang, Ailing Zeng, Jie Yang, Xingyu Liao, Ding Jia, Hongyang Li, He Cao, Jianan Wang, Zhaoyang Zeng, Xianbiao Qi, Yuhui Yuan, Jianwei Yang, Lei Zhang

To address this issue, we develop a unified, highly modular, and lightweight codebase called detrex, which supports a majority of the mainstream DETR-based instance recognition algorithms, covering various fundamental tasks, including object detection, segmentation, and pose estimation.

Benchmarking object-detection +2

1,821

Paper
Code

GlyphControl: Glyph Conditional Control for Visual Text Generation

1 code implementation • NeurIPS 2023 • Yukang Yang, Dongnan Gui, Yuhui Yuan, Weicong Liang, Haisong Ding, Han Hu, Kai Chen

We evaluate the effectiveness of our approach by measuring OCR-based metrics, CLIP score, and FID of the generated visual text.

Optical Character Recognition (OCR) Text Generation

174

Paper
Code

DETR Does Not Need Multi-Scale or Locality Design

1 code implementation • ICCV 2023 • Yutong Lin, Yuhui Yuan, Zheng Zhang, Chen Li, Nanning Zheng, Han Hu

182

Paper
Code

ClipCrop: Conditioned Cropping Driven by Vision-Language Model

no code implementations • 21 Nov 2022 • Zhihang Zhong, Mingxi Cheng, Zhirong Wu, Yuhui Yuan, Yinqiang Zheng, Ji Li, Han Hu, Stephen Lin, Yoichi Sato, Imari Sato

Image cropping has progressed tremendously under the data-driven paradigm.

Image Cropping Language Modelling

Paper
Add Code

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

4 code implementations • 3 Oct 2022 • Weicong Liang, Yuhui Yuan, Henghui Ding, Xiao Luo, WeiHong Lin, Ding Jia, Zheng Zhang, Chao Zhang, Han Hu

Vision transformers have recently achieved competitive results across various vision tasks but still suffer from heavy computation costs when processing a large number of tokens.

Clustering Depth Estimation +6

Paper
Code

DETRs with Hybrid Matching

8 code implementations • CVPR 2023 • Ding Jia, Yuhui Yuan, Haodi He, Xiaopei Wu, Haojun Yu, WeiHong Lin, Lei Sun, Chao Zhang, Han Hu

One-to-one set matching is a key design for DETR to establish its end-to-end capability, so that object detection does not require a hand-crafted NMS (non-maximum suppression) to remove duplicate detections.

Object Detection Pose Estimation +2

1,821

Paper
Code

Region Rebalance for Long-Tailed Semantic Segmentation

5 code implementations • 5 Apr 2022 • Jiequan Cui, Yuhui Yuan, Zhisheng Zhong, Zhuotao Tian, Han Hu, Stephen Lin, Jiaya Jia

In this paper, we study the problem of class imbalance in semantic segmentation.

Ranked #21 on Semantic Segmentation on ADE20K

Segmentation Semantic Segmentation

222

Paper
Code

RankSeg: Adaptive Pixel Classification with Image Category Ranking for Segmentation

2 code implementations • 8 Mar 2022 • Haodi He, Yuhui Yuan, Xiangyu Yue, Han Hu

Given an input image or video, our framework first conducts multi-label classification over the complete label, then sorts the complete label and selects a small subset according to their class confidence scores.

Classification Instance Segmentation +6

Paper
Code

HRFormer: High-Resolution Vision Transformer for Dense Predict

2 code implementations • NeurIPS 2021 • Yuhui Yuan, Rao Fu, Lang Huang, WeiHong Lin, Chao Zhang, Xilin Chen, Jingdong Wang

We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks, in contrast to the original Vision Transformer that produces low-resolution representations and has high memory and computational cost.

Pose Estimation Semantic Segmentation +1

474

Paper
Code

HRFormer: High-Resolution Transformer for Dense Prediction

1 code implementation • 18 Oct 2021 • Yuhui Yuan, Rao Fu, Lang Huang, WeiHong Lin, Chao Zhang, Xilin Chen, Jingdong Wang

Ranked #3 on Pose Estimation on AIC

Image Classification Multi-Person Pose Estimation +2

474

Paper
Code

Conditional DETR for Fast Training Convergence

3 code implementations • ICCV 2021 • Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang

Our approach, named conditional DETR, learns a conditional spatial query from the decoder embedding for decoder multi-head cross-attention.

Object object-detection +1

125,118

Paper
Code

Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

3 code implementations • CVPR 2021 • Xiaokang Chen, Yuhui Yuan, Gang Zeng, Jingdong Wang

Our approach imposes the consistency on two segmentation networks perturbed with different initialization for the same input image.

Ranked #2 on Semi-Supervised Semantic Segmentation on WoodScape

Segmentation Semi-Supervised Semantic Segmentation

475

Paper
Code

SegFix: Model-Agnostic Boundary Refinement for Segmentation

4 code implementations • ECCV 2020 • Yuhui Yuan, Jingyi Xie, Xilin Chen, Jingdong Wang

We present a model-agnostic post-processing scheme to improve the boundary quality for the segmentation result that is generated by any existing segmentation model.

Segmentation

1,174

Paper
Code

Beyond Human Parts: Dual Part-Aligned Representations for Person Re-Identification

1 code implementation • ICCV 2019 • Jianyuan Guo, Yuhui Yuan, Lang Huang, Chao Zhang, Jinge Yao, Kai Han

On the other hand, there still exist many useful contextual cues that do not fall into the scope of predefined human parts or attributes.

Ranked #59 on Person Re-Identification on DukeMTMC-reID

Human Parsing Person Re-Identification

Paper
Code

Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation

11 code implementations • ECCV 2020 • Yuhui Yuan, Xiaokang Chen, Xilin Chen, Jingdong Wang

We empirically demonstrate that the proposed approach achieves competitive performance on various challenging semantic segmentation benchmarks: Cityscapes, ADE20K, LIP, PASCAL-Context, and COCO-Stuff.

Ranked #5 on Semantic Segmentation on LIP val

Object Segmentation +1

8,260

Paper
Code

Interlaced Sparse Self-Attention for Semantic Segmentation

6 code implementations • 29 Jul 2019 • Lang Huang, Yuhui Yuan, Jianyuan Guo, Chao Zhang, Xilin Chen, Jingdong Wang

There are two successive attention modules each estimating a sparse affinity matrix.

Segmentation Semantic Segmentation

8,260

Paper
Code

OCNet: Object Context Network for Scene Parsing

8 code implementations • 4 Sep 2018 • Yuhui Yuan, Lang Huang, Jianyuan Guo, Chao Zhang, Xilin Chen, Jingdong Wang

To capture richer context information, we further combine our interlaced sparse self-attention scheme with the conventional multi-scale context schemes including pyramid pooling~\citep{zhao2017pyramid} and atrous spatial pyramid pooling~\citep{chen2018deeplab}.

Ranked #9 on Semantic Segmentation on Trans10K

Object Relation +2

7,409

Paper
Code

Feature Incay for Representation Regularization

no code implementations • ICLR 2018 • Yuhui Yuan, Kuiyuan Yang, Chao Zhang

Thus, we propose feature incay to also regularize representation learning, which favors feature vectors with large norm when the samples can be correctly classified.

Multi-class Classification Representation Learning

Paper
Add Code

Hard-Aware Deeply Cascaded Embedding

1 code implementation • ICCV 2017 • Yuhui Yuan, Kuiyuan Yang, Chao Zhang

This motivates us to ensemble a set of models with different complexities in cascaded manner and mine hard examples adaptively, a sample is judged by a series of models with increasing complexities and only updates models that consider the sample as a hard case.

Ranked #14 on Image Retrieval on SOP

Metric Learning

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.