Search Results for author: Biao Gong

Found 12 papers, 5 papers with code

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

1 code implementation25 Dec 2023 Xiang Wang, Shiwei Zhang, Hangjie Yuan, Zhiwu Qing, Biao Gong, Yingya Zhang, Yujun Shen, Changxin Gao, Nong Sang

Following such a pipeline, we study the effect of doubling the scale of training set (i. e., video-only WebVid10M) with some randomly collected text-free videos and are encouraged to observe the performance improvement (FID from 9. 67 to 8. 19 and FVD from 484 to 441), demonstrating the scalability of our approach.

Text-to-Image Generation Text-to-Video Generation +2

Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following

no code implementations28 Nov 2023 Yutong Feng, Biao Gong, Di Chen, Yujun Shen, Yu Liu, Jingren Zhou

Existing text-to-image (T2I) diffusion models usually struggle in interpreting complex prompts, especially those with quantity, object-attribute binding, and multi-subject descriptions.

Attribute Denoising +1

Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

no code implementations27 Nov 2023 Siteng Huang, Biao Gong, Yutong Feng, Xi Chen, Yuqian Fu, Yu Liu, Donglin Wang

Experimental results show that existing subject-driven customization methods fail to learn the representative characteristics of actions and struggle in decoupling actions from context features, including appearance.

Text-to-Image Generation

Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

no code implementations27 Nov 2023 Biao Gong, Siteng Huang, Yutong Feng, Shiwei Zhang, Yuyuan Li, Yu Liu

To align the generated image with layout instructions, we present a training-free layout calibration system SimM that intervenes in the generative process on the fly during inference time.

Text-to-Image Generation

Logic Diffusion for Knowledge Graph Reasoning

no code implementations6 Jun 2023 Xiaoying Xie, Biao Gong, Yiliang Lv, Zhen Han, Guoshuai Zhao, Xueming Qian

Most recent works focus on answering first order logical queries to explore the knowledge graph reasoning via multi-hop logic predictions.

Selective and Collaborative Influence Function for Efficient Recommendation Unlearning

no code implementations20 Apr 2023 Yuyuan Li, Chaochao Chen, Xiaolin Zheng, Yizhao Zhang, Biao Gong, Jun Wang

In this paper, we first identify two main disadvantages of directly applying existing unlearning methods in the context of recommendation, i. e., (i) unsatisfactory efficiency for large-scale recommendation models and (ii) destruction of collaboration across users and items.

Recommendation Systems

Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning

1 code implementation27 Mar 2023 Siteng Huang, Biao Gong, Yutong Feng, Min Zhang, Yiliang Lv, Donglin Wang

Recent compositional zero-shot learning (CZSL) methods adapt pre-trained vision-language models (VLMs) by constructing trainable prompts only for composed state-object pairs.

Compositional Zero-Shot Learning Object

ViM: Vision Middleware for Unified Downstream Transferring

no code implementations ICCV 2023 Yutong Feng, Biao Gong, Jianwen Jiang, Yiliang Lv, Yujun Shen, Deli Zhao, Jingren Zhou

ViM consists of a zoo of lightweight plug-in modules, each of which is independently learned on a midstream dataset with a shared frozen backbone.

UKnow: A Unified Knowledge Protocol for Common-Sense Reasoning and Vision-Language Pre-training

1 code implementation14 Feb 2023 Biao Gong, Xiaoying Xie, Yutong Feng, Yiliang Lv, Yujun Shen, Deli Zhao

This work presents a unified knowledge protocol, called UKnow, which facilitates knowledge-based studies from the perspective of data.

Common Sense Reasoning

VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval

1 code implementation CVPR 2023 Siteng Huang, Biao Gong, Yulin Pan, Jianwen Jiang, Yiliang Lv, Yuyuan Li, Donglin Wang

Many recent studies leverage the pre-trained CLIP for text-video cross-modal retrieval by tuning the backbone with additional heavy modules, which not only brings huge computational burdens with much more parameters, but also leads to the knowledge forgetting from upstream models.

Cross-Modal Retrieval Retrieval +1

Deep Multi-View Enhancement Hashing for Image Retrieval

no code implementations1 Feb 2020 Chenggang Yan, Biao Gong, Yuxuan Wei, Yue Gao

Therefore, we try to introduce the multi-view deep neural network into the hash learning field, and design an efficient and innovative retrieval model, which has achieved a significant improvement in retrieval performance.

Image Retrieval Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.