Trending Research

ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing

poloclub/clickdiffusion • 5 Apr 2024

We demonstrate that by serializing both an image and a multi-modal instruction into a textual representation it is possible to leverage LLMs to perform precise transformations of the layout and appearance of an image.

Image Manipulation

0.33 stars / hour

Paper
Code

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

RLHFlow/RLHF-Reward-Modeling • • 13 Apr 2023

Utilizing a reward model and a sufficient number of samples, our approach selects the high-quality samples, discarding those that exhibit undesired behavior, and subsequently enhancing the model by fine-tuning on these filtered samples.

Ethics

195

0.33 stars / hour

Paper
Code

CDFormer:When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution

i2-multimedia-lab/cdformer • • 13 May 2024

Existing Blind image Super-Resolution (BSR) methods focus on estimating either kernel or degradation information, but have long overlooked the essential content details.

Image Super-Resolution

0.32 stars / hour

Paper
Code

Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

hitsz-tmg/umoe-scaling-unified-multimodal-llms • • 18 May 2024

Although the Mixture of Experts (MoE) architecture has been employed to efficiently scale large language and image-text models, these efforts typically involve fewer experts and limited modalities.

0.29 stars / hour

Paper
Code

GIVT: Generative Infinite-Vocabulary Transformers

google-research/big_vision • • 4 Dec 2023

We introduce generative infinite-vocabulary transformers (GIVT) which generate vector sequences with real-valued entries, instead of discrete tokens from a finite vocabulary.

Ranked #13 on Image Generation on ImageNet 256x256

Conditional Image Generation Decoder +2

1,787

0.29 stars / hour

Paper
Code

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

assafelovic/gpt-researcher • 22 Feb 2024

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages.

Retrieval

11,219

0.28 stars / hour

Paper
Code

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

om-ai-lab/OmDet • • 11 Mar 2024

End-to-end transformer-based detectors (DETRs) have shown exceptional performance in both closed-set and open-vocabulary object detection (OVD) tasks through the integration of language modalities.

Object object-detection +2

0.28 stars / hour

Paper
Code

Improving Diffusion Models for Virtual Try-on

yisol/IDM-VTON • • 8 Mar 2024

Finally, we present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.

Ranked #1 on Virtual Try-on on VITON-HD

Virtual Try-on

2,513

0.27 stars / hour

Paper
Code

Kolmogorov-Arnold Networks are Radial Basis Function Networks

ZiyaoLi/fast-kan • • 10 May 2024

This short paper is a fast proof-of-concept that the 3-order B-splines used in Kolmogorov-Arnold Networks (KANs) can be well approximated by Gaussian radial basis functions.

191

0.27 stars / hour

Paper
Code

Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information

alibaba-damo-academy/FunASR • • 28 Nov 2021

In this paper, we reformulate this task as a single-label prediction problem by encoding the multi-speaker labels with power set.

Action Detection Activity Detection +2

3,853

0.26 stars / hour

Paper
Code