Search Results for author: Rongyao Fang

Found 12 papers, 9 papers with code

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

1 code implementation • 19 Mar 2024 • Linjiang Huang, Rongyao Fang, Aiping Zhang, Guanglu Song, Si Liu, Yu Liu, Hongsheng Li

In this study, we delve into the generation of high-resolution images from pre-trained diffusion models, addressing persistent challenges, such as repetitive patterns and structural distortions, that emerge when models are applied beyond their trained resolutions.

Text-to-Image Generation

Paper
Code

InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation

1 code implementation • 30 Nov 2023 • Rongyao Fang, Shilin Yan, Zhaoyang Huang, Jingqiu Zhou, Hao Tian, Jifeng Dai, Hongsheng Li

In this work, we introduce InstructSeq, an instruction-conditioned multi-modal modeling framework that unifies diverse vision tasks through flexible natural language control and handling of both visual and textual data.

Image Captioning Referring Expression +2

Paper
Code

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking

1 code implementation • 9 Mar 2023 • Peng Gao, Renrui Zhang, Rongyao Fang, Ziyi Lin, Hongyang Li, Hongsheng Li, Qiao Yu

To alleviate this, previous methods simply replace the pixel reconstruction targets of 75% masked tokens by encoded features from pre-trained image-image (DINO) or image-language (CLIP) contrastive learning.

Contrastive Learning

455

Paper
Code

FeatAug-DETR: Enriching One-to-Many Matching for DETRs with Feature Augmentation

1 code implementation • 2 Mar 2023 • Rongyao Fang, Peng Gao, Aojun Zhou, Yingjie Cai, Si Liu, Jifeng Dai, Hongsheng Li

The first method is One-to-many Matching via Data Augmentation (denoted as DataAug-DETR).

Data Augmentation object-detection +1

Paper
Code

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification

3 code implementations • 19 Jul 2022 • Renrui Zhang, Zhang Wei, Rongyao Fang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, Hongsheng Li

On top of that, the performance of Tip-Adapter can be further boosted to be state-of-the-art on ImageNet by fine-tuning the cache model for 10$\times$ fewer epochs than existing methods, which is both effective and efficient.

Retrieval Transfer Learning

471

Paper
Code

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training

3 code implementations • 28 May 2022 • Renrui Zhang, Ziyu Guo, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao, Hongsheng Li, Peng Gao

By fine-tuning on downstream tasks, Point-M2AE achieves 86. 43% accuracy on ScanObjectNN, +3. 36% to the second-best, and largely benefits the few-shot classification, part segmentation and 3D object detection with the hierarchical pre-training scheme.

Ranked #4 on 3D Point Cloud Linear Classification on ModelNet40 (using extra training data)

3D Object Detection 3D Point Cloud Linear Classification +5

198

Paper
Code

RBGNet: Ray-based Grouping for 3D Object Detection

1 code implementation • CVPR 2022 • Haiyang Wang, Shaoshuai Shi, Ze Yang, Rongyao Fang, Qi Qian, Hongsheng Li, Bernt Schiele, LiWei Wang

In order to learn better representations of object shape to enhance cluster features for predicting 3D boxes, we propose a ray-based feature grouping module, which aggregates the point-wise features on object surfaces using a group of determined rays uniformly emitted from cluster centers.

Ranked #13 on 3D Object Detection on ScanNetV2

3D Object Detection Object +1

Paper
Code

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

1 code implementation • 6 Nov 2021 • Renrui Zhang, Rongyao Fang, Wei zhang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, Hongsheng Li

To further enhance CLIP's few-shot capability, CLIP-Adapter proposed to fine-tune a lightweight residual feature adapter and significantly improves the performance for few-shot classification.

Language Modelling Transfer Learning

471

Paper
Code

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

2 code implementations • 9 Oct 2021 • Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li, Yu Qiao

Large-scale contrastive vision-language pre-training has shown significant progress in visual representation learning.

Prompt Engineering Representation Learning

390

Paper
Code

Learning Longterm Representations for Person Re-Identification Using Radio Signals

no code implementations • CVPR 2020 • Lijie Fan, Tianhong Li, Rongyao Fang, Rumen Hristov, Yuan Yuan, Dina Katabi

RF signals traverse clothes and reflect off the human body; thus they can be used to extract more persistent human-identifying features like body size and shape.

Person Re-Identification Privacy Preserving

Paper
Add Code

Probabilistic Radiomics: Ambiguous Diagnosis with Controllable Shape Analysis

no code implementations • 20 Oct 2019 • Jiancheng Yang, Rongyao Fang, Bingbing Ni, Yamin Li, Yi Xu, Linguo Li

The final diagnosis is obtained by combining the ambiguity prior sample and lesion representation, and the whole network named $DenseSharp^{+}$ is end-to-end trainable.

Probabilistic Deep Learning

Paper
Add Code

Adversarial Attack and Defense on Point Sets

no code implementations • 28 Feb 2019 • Jiancheng Yang, Qiang Zhang, Rongyao Fang, Bingbing Ni, Jinxian Liu, Qi Tian

A set of novel 3D point cloud attack operations are proposed via pointwise gradient perturbation and adversarial point attachment / detachment.

Adversarial Attack

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.