Search Results for author: Patrick Esser

Found 17 papers, 11 papers with code

Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

no code implementations • 18 Mar 2024 • Axel Sauer, Frederic Boesel, Tim Dockhorn, Andreas Blattmann, Patrick Esser, Robin Rombach

Distillation methods, like the recently introduced adversarial diffusion distillation (ADD) aim to shift the model from many-shot to single-step inference, albeit at the cost of expensive and difficult optimization due to its reliance on a fixed pretrained DINOv2 discriminator.

Image Generation

Paper
Add Code

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

1 code implementation • 5 Mar 2024 • Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, Robin Rombach

Rectified flow is a recent generative model formulation that connects data and noise in a straight line.

Reading Comprehension Text-to-Image Generation

Paper
Code

Structure and Content-Guided Video Synthesis with Diffusion Models

no code implementations • ICCV 2023 • Patrick Esser, Johnathan Chiu, Parmida Atighehchian, Jonathan Granskog, Anastasis Germanidis

Text-guided generative diffusion models unlock powerful image creation and editing tools.

Disentanglement Text-to-Video Generation +1

Paper
Add Code

Towards Unified Keyframe Propagation Models

1 code implementation • 19 May 2022 • Patrick Esser, Peter Michael, Soumyadip Sengupta

We evaluate our two-stream approach for inpainting tasks, where experiments show that it improves both the propagation of features within a single frame as required for image inpainting, as well as their propagation from keyframes to target frames.

Image Inpainting Video Editing +1

230

Paper
Code

High-Resolution Image Synthesis with Latent Diffusion Models

33 code implementations • CVPR 2022 • Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer

By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond.

Ranked #2 on Layout-to-Image Generation on COCO-Stuff 256x256

Denoising Image Inpainting +5

65,494

Paper
Code

ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis

no code implementations • NeurIPS 2021 • Patrick Esser, Robin Rombach, Andreas Blattmann, Björn Ommer

Thus, in contrast to pure autoregressive models, it can solve free-form image inpainting and, in the case of conditional models, local, text-guided image modification without requiring mask-specific training.

Ranked #4 on Text-to-Image Generation on Conceptual Captions

Image Inpainting Text-to-Image Generation

Paper
Add Code

Geometry-Free View Synthesis: Transformers and no 3D Priors

1 code implementation • ICCV 2021 • Robin Rombach, Patrick Esser, Björn Ommer

Is a geometric model required to synthesize novel views from a single image?

Ranked #1 on Novel View Synthesis on RealEstate10K

Novel View Synthesis

361

Paper
Code

Shape or Texture: Understanding Discriminative Features in CNNs

no code implementations • 27 Jan 2021 • Md Amirul Islam, Matthew Kowal, Patrick Esser, Sen Jia, Bjorn Ommer, Konstantinos G. Derpanis, Neil Bruce

Contrasting the previous evidence that neurons in the later layers of a Convolutional Neural Network (CNN) respond to complex object shapes, recent studies have shown that CNNs actually exhibit a `texture bias': given an image with both texture and shape cues (e. g., a stylized image), a CNN is biased towards predicting the category corresponding to the texture.

Paper
Add Code

Shape or Texture: Disentangling Discriminative Features in CNNs

no code implementations • ICLR 2021 • Md Amirul Islam, Matthew Kowal, Patrick Esser, Sen Jia, Björn Ommer, Konstantinos G. Derpanis, Neil Bruce

Contrasting the previous evidence that neurons in the later layers of a Convolutional Neural Network (CNN) respond to complex object shapes, recent studies have shown that CNNs actually exhibit a 'texture bias': given an image with both texture and shape cues (e. g., a stylized image), a CNN is biased towards predicting the category corresponding to the texture.

Paper
Add Code

Taming Transformers for High-Resolution Image Synthesis

12 code implementations • CVPR 2021 • Patrick Esser, Robin Rombach, Björn Ommer

We demonstrate how combining the effectiveness of the inductive bias of CNNs with the expressivity of transformers enables them to model and thereby synthesize high-resolution images.

Ranked #3 on Text-to-Image Generation on LHQC

DeepFake Detection Image Outpainting +4

5,379

Paper
Code

A Note on Data Biases in Generative Models

1 code implementation • 4 Dec 2020 • Patrick Esser, Robin Rombach, Björn Ommer

It is tempting to think that machines are less prone to unfairness and prejudice.

219

Paper
Code

Unsupervised Part Discovery by Unsupervised Disentanglement

1 code implementation • 9 Sep 2020 • Sandro Braun, Patrick Esser, Björn Ommer

Our approach leverages a generative model consisting of two disentangled representations for an object's shape and appearance and a latent variable for the part segmentation.

Disentanglement Segmentation

Paper
Code

Making Sense of CNNs: Interpreting Deep Representations & Their Invariances with INNs

1 code implementation • ECCV 2020 • Robin Rombach, Patrick Esser, Björn Ommer

To open such a black box, it is, therefore, crucial to uncover the different semantic concepts a model has learned as well as those that it has learned to be invariant to.

Paper
Code

Network-to-Network Translation with Conditional Invertible Neural Networks

1 code implementation • NeurIPS 2020 • Robin Rombach, Patrick Esser, Björn Ommer

Given the ever-increasing computational costs of modern machine learning models, we need to find new ways to reuse such expert models and thus tap into the resources that have been invested in their creation.

Image-to-Image Translation Text-to-Image Generation +1

219

Paper
Code

A Disentangling Invertible Interpretation Network for Explaining Latent Representations

2 code implementations • CVPR 2020 • Patrick Esser, Robin Rombach, Björn Ommer

We formulate interpretation as a translation of hidden representations onto semantic concepts that are comprehensible to the user.

Image Generation Image Manipulation

120

Paper
Code

Unsupervised Robust Disentangling of Latent Characteristics for Image Synthesis

no code implementations • ICCV 2019 • Patrick Esser, Johannes Haux, Björn Ommer

In experiments on diverse object categories, the approach successfully recombines pose and appearance to reconstruct and retarget novel synthesized images.

Disentanglement Image Generation +1

Paper
Add Code

A Variational U-Net for Conditional Appearance and Shape Generation

2 code implementations • CVPR 2018 • Patrick Esser, Ekaterina Sutter, Björn Ommer

Experiments show that the model enables conditional image generation and transfer.

Conditional Image Generation

497

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.