Search Results for author: Yuren Cong

Found 9 papers, 3 papers with code

WorldAfford: Affordance Grounding based on Natural Language Instructions

no code implementations • 21 May 2024 • Changmao Chen, Yuren Cong, Zhen Kan

In particular, WorldAfford can localize the affordance regions of multiple objects and provide an alternative when objects in the environment cannot fully match the given instruction.

Paper
Add Code

Segment Any Object Model (SAOM): Real-to-Simulation Fine-Tuning Strategy for Multi-Class Multi-Instance Segmentation

no code implementations • 16 Mar 2024 • Mariia Khan, Yue Qiu, Yuren Cong, Jumana Abu-Khalaf, David Suter, Bodo Rosenhahn

The foundational Segment Anything Model (SAM) is designed for promptable multi-class multi-instance segmentation but tends to output part or sub-part masks in the "everything" mode for various real-world applications.

Instance Segmentation Object +3

Paper
Add Code

GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation

no code implementations • 7 Dec 2023 • Shoufa Chen, Mengmeng Xu, Jiawei Ren, Yuren Cong, Sen He, Yanping Xie, Animesh Sinha, Ping Luo, Tao Xiang, Juan-Manuel Perez-Rua

In this study, we explore Transformer-based diffusion models for image and video generation.

Text-to-Video Generation Video Generation

Paper
Add Code

FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing

no code implementations • 9 Oct 2023 • Yuren Cong, Mengmeng Xu, Christian Simon, Shoufa Chen, Jiawei Ren, Yanping Xie, Juan-Manuel Perez-Rua, Bodo Rosenhahn, Tao Xiang, Sen He

In this paper, for the first time, we introduce optical flow into the attention module in the diffusion model's U-Net to address the inconsistency issue for text-to-video editing.

Optical Flow Estimation Text-to-Video Editing +1

Paper
Add Code

SPAN: Learning Similarity between Scene Graphs and Images with Transformers

1 code implementation • 2 Apr 2023 • Yuren Cong, Wentong Liao, Bodo Rosenhahn, Michael Ying Yang

Learning similarity between scene graphs and images aims to estimate a similarity score given a scene graph and an image.

Contrastive Learning Graph Generation +3

Paper
Code

Attribute-Centric Compositional Text-to-Image Generation

no code implementations • 4 Jan 2023 • Yuren Cong, Martin Renqiang Min, Li Erran Li, Bodo Rosenhahn, Michael Ying Yang

We further propose an attribute-centric contrastive loss to avoid overfitting to overrepresented attribute compositions.

Attribute Fairness +1

Paper
Add Code

SSGVS: Semantic Scene Graph-to-Video Synthesis

no code implementations • 11 Nov 2022 • Yuren Cong, Jinhui Yi, Bodo Rosenhahn, Michael Ying Yang

A semantic scene graph-to-video synthesis framework (SSGVS), based on the pre-trained VSG encoder, VQ-VAE, and auto-regressive Transformer, is proposed to synthesize a video given an initial scene image and a non-fixed number of semantic scene graphs.

Image Generation

Paper
Add Code

RelTR: Relation Transformer for Scene Graph Generation

1 code implementation • 27 Jan 2022 • Yuren Cong, Michael Ying Yang, Bodo Rosenhahn

Different objects in the same scene are more or less related to each other, but only a limited number of these relationships are noteworthy.

Decoder Graph Generation +5

222

Paper
Code

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

1 code implementation • ICCV 2021 • Yuren Cong, Wentong Liao, Hanno Ackermann, Bodo Rosenhahn, Michael Ying Yang

Compared to the task of scene graph generation from images, it is more challenging because of the dynamic relationships between objects and the temporal dependencies between frames allowing for a richer semantic interpretation.

Decoder Scene Graph Generation +2

176

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.