Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

opendrivelab/vista 27 May 2024

In this paper, we present Vista, a generalizable driving world model with high fidelity and versatile controllability.

Autonomous Driving Video Generation

246
0.33 stars / hour

Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis

zhangzc21/dyntet 27 Feb 2024

Recent works in implicit representations, such as Neural Radiance Fields (NeRF), have advanced the generation of realistic and animatable head avatars from video sequences.

144
0.33 stars / hour

Improving the Training of Rectified Flows

sangyun884/rfpp 30 May 2024

In this work, we propose improved techniques for training rectified flows, allowing them to compete with knowledge distillation methods even in the low NFE setting.

Knowledge Distillation Numerical Integration +1

25
0.32 stars / hour

LLMs Meet Multimodal Generation and Editing: A Survey

yingqinghe/awesome-llms-meet-multimodal-generation 29 May 2024

With the recent advancement in large language models (LLMs), there is a growing interest in combining LLMs with multimodal learning.

multimodal generation

122
0.32 stars / hour

Improved Distribution Matching Distillation for Fast Image Synthesis

tianweiy/DMD2 23 May 2024

Recent approaches have shown promises distilling diffusion models into efficient one-step generators.

Image Generation

219
0.30 stars / hour

Efficient Multimodal Large Language Models: A Survey

lijiannuist/efficient-multimodal-llms-survey 17 May 2024

In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning.

Edge-computing Question Answering +1

115
0.30 stars / hour

Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras

huajianup/photo-slam 28 Nov 2023

In addition to actively densifying hyper primitives based on geometric features, we further introduce a Gaussian-Pyramid-based training method to progressively learn multi-level features, enhancing photorealistic mapping performance.

Neural Rendering

187
0.29 stars / hour

DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ

potamides/detikzify 24 May 2024

Creating high-quality scientific figures can be time-consuming and challenging, even though sketching ideas on paper is relatively easy.

Language Modelling

55
0.29 stars / hour

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

wangyuchi369/InstructAvatar 24 May 2024

Recent talking avatar generation models have made strides in achieving realistic and accurate lip synchronization with the audio, but often fall short in controlling and conveying detailed expressions and emotions of the avatar, making the generated video less vivid and controllable.

97
0.29 stars / hour

Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling

black-yt/weathergft 22 May 2024

Data-driven artificial intelligence (AI) models have made significant advancements in weather forecasting, particularly in medium-range and nowcasting.

Weather Forecasting

27
0.28 stars / hour