Trending Research

Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

opendrivelab/vista • • 27 May 2024

In this paper, we present Vista, a generalizable driving world model with high fidelity and versatile controllability.

Autonomous Driving Video Generation

246

0.33 stars / hour

Paper
Code

Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis

zhangzc21/dyntet • • 27 Feb 2024

Recent works in implicit representations, such as Neural Radiance Fields (NeRF), have advanced the generation of realistic and animatable head avatars from video sequences.

144

0.33 stars / hour

Paper
Code

Improving the Training of Rectified Flows

sangyun884/rfpp • • 30 May 2024

In this work, we propose improved techniques for training rectified flows, allowing them to compete with knowledge distillation methods even in the low NFE setting.

Knowledge Distillation Numerical Integration +1

0.32 stars / hour

Paper
Code

LLMs Meet Multimodal Generation and Editing: A Survey

yingqinghe/awesome-llms-meet-multimodal-generation • • 29 May 2024

With the recent advancement in large language models (LLMs), there is a growing interest in combining LLMs with multimodal learning.

multimodal generation

122

0.32 stars / hour

Paper
Code

Improved Distribution Matching Distillation for Fast Image Synthesis

tianweiy/DMD2 • • 23 May 2024

Recent approaches have shown promises distilling diffusion models into efficient one-step generators.

Image Generation

219

0.30 stars / hour

Paper
Code

Efficient Multimodal Large Language Models: A Survey

lijiannuist/efficient-multimodal-llms-survey • 17 May 2024

In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning.

Edge-computing Question Answering +1

115

0.30 stars / hour

Paper
Code

Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras

huajianup/photo-slam • • 28 Nov 2023

In addition to actively densifying hyper primitives based on geometric features, we further introduce a Gaussian-Pyramid-based training method to progressively learn multi-level features, enhancing photorealistic mapping performance.

Neural Rendering

187

0.29 stars / hour

Paper
Code

DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

potamides/detikzify • • 24 May 2024

Creating high-quality scientific figures can be time-consuming and challenging, even though sketching ideas on paper is relatively easy.

Language Modelling

0.29 stars / hour

Paper
Code

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

wangyuchi369/InstructAvatar • 24 May 2024

Recent talking avatar generation models have made strides in achieving realistic and accurate lip synchronization with the audio, but often fall short in controlling and conveying detailed expressions and emotions of the avatar, making the generated video less vivid and controllable.

0.29 stars / hour

Paper
Code

Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling

black-yt/weathergft • 22 May 2024

Data-driven artificial intelligence (AI) models have made significant advancements in weather forecasting, particularly in medium-range and nowcasting.

Weather Forecasting

0.28 stars / hour

Paper
Code