Trending Research

EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training

leaplabthu/efficienttrain • • 14 May 2024

These patterns, when observed through frequency and spatial domains, incorporate lower-frequency components, and the natural image contents without distortion or data augmentation.

Data Augmentation Self-Supervised Learning

152

0.42 stars / hour

Paper
Code

Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

haozheliu-st/t-gate • • 3 Apr 2024

This study explores the role of cross-attention during inference in text-conditional diffusion models.

276

0.41 stars / hour

Paper
Code

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

deepseek-ai/deepseek-v2 • • 7 May 2024

MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation.

Language Modelling Reinforcement Learning (RL)

2,233

0.40 stars / hour

Paper
Code

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

assafelovic/gpt-researcher • 22 Feb 2024

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages.

Retrieval

11,575

0.39 stars / hour

Paper
Code

From Sora What We Can See: A Survey of Text-to-Video Generation

soraw-ai/awesome-text-to-video-generation • • 17 May 2024

With impressive achievements made, artificial intelligence is on the path forward to artificial general intelligence.

Text-to-Video Generation Video Generation

0.38 stars / hour

Paper
Code

CDFormer:When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution

i2-multimedia-lab/cdformer • • 13 May 2024

Existing Blind image Super-Resolution (BSR) methods focus on estimating either kernel or degradation information, but have long overlooked the essential content details.

Image Super-Resolution

0.38 stars / hour

Paper
Code

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

alpha-vllm/lumina-t2x • • 9 May 2024

Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details.

1,138

0.37 stars / hour

Paper
Code

SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing

modelscope/swift • • 18 Dec 2023

Image diffusion models have been utilized in various tasks, such as text-to-image generation and controllable image synthesis.

Decoder Text-to-Image Generation

1,615

0.35 stars / hour

Paper
Code

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

JackAILab/ConsistentID • • 25 Apr 2024

ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network optimized through the facial attention localization strategy, aimed at preserving ID consistency in facial regions.

577

0.35 stars / hour

Paper
Code

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

openbmb/minicpm • • 9 Apr 2024

For data scaling, we introduce a Warmup-Stable-Decay (WSD) learning rate scheduler (LRS), conducive to continuous training and domain adaptation.

Domain Adaptation

4,031

0.35 stars / hour

Paper
Code