Trending Research

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

tencent/hunyuandit • • 14 May 2024

For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images.

Image Generation Language Modelling +2

2,185

0.42 stars / hour

Paper
Code

GLACE: Global Local Accelerated Coordinate Encoding

cvg/glace • • 6 Jun 2024

We propose GLACE, which integrates pre-trained global and local encodings and enables SCR to scale to large scenes with only a single small-sized network.

Pose Estimation Position +1

0.42 stars / hour

Paper
Code

TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation

saidwivedi/TokenHMR • • 25 Apr 2024

We address the problem of regressing 3D human pose and shape from a single image, with a focus on 3D accuracy.

Ranked #32 on 3D Human Pose Estimation on 3DPW

3D Human Pose Estimation Human Mesh Recovery +1

117

0.42 stars / hour

Paper
Code

SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing

modelscope/swift • • 18 Dec 2023

Image diffusion models have been utilized in various tasks, such as text-to-image generation and controllable image synthesis.

Decoder Text-to-Image Generation

1,849

0.39 stars / hour

Paper
Code

VideoTetris: Towards Compositional Text-to-Video Generation

yangling0818/videotetris • 6 Jun 2024

Diffusion models have demonstrated great success in text-to-video (T2V) generation.

Denoising Text-to-Video Generation +1

0.38 stars / hour

Paper
Code

CV-VAE: A Compatible Video VAE for Latent Generative Video Models

ailab-cvc/cv-vae • • 30 May 2024

Moreover, since current diffusion-based approaches are often implemented using pre-trained text-to-image (T2I) models, directly training a video VAE without considering the compatibility with existing T2I models will result in a latent space gap between them, which will take huge computational resources for training to bridge the gap even with the T2I models as initialization.

Quantization

129

0.38 stars / hour

Paper
Code

GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning

cmavro/gnn-rag • • 30 May 2024

In our GNN-RAG framework, the GNN acts as a dense subgraph reasoner to extract useful graph information, while the LLM leverages its natural language processing ability for ultimate KGQA.

Graph Question Answering Knowledge Graphs +4

0.37 stars / hour

Paper
Code

Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models

laion-ai/aiw • 4 Jun 2024

Large Language Models (LLMs) are often described as being instances of foundation models - that is, models that transfer strongly across various tasks and conditions in few-show or zero-shot manner, while exhibiting scaling laws that predict function improvement when increasing the pre-training scale.

Common Sense Reasoning

0.36 stars / hour

Paper
Code

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

state-spaces/mamba • • 31 May 2024

While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale.

Language Modelling

10,907

0.36 stars / hour

Paper
Code

$\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving

nnanhuang/s3gaussian • • 30 May 2024

Photorealistic 3D reconstruction of street scenes is a critical technique for developing real-world simulators for autonomous driving.

3D Reconstruction 3D Scene Reconstruction +1

218

0.36 stars / hour

Paper
Code