Trending Research

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

tencentarc/instantmesh • • 10 Apr 2024

We present InstantMesh, a feed-forward framework for instant 3D mesh generation from a single image, featuring state-of-the-art generation quality and significant training scalability.

Image to 3D

1,733

1.17 stars / hour

Paper
Code

MolTC: Towards Molecular Relational Modeling In Language Models

MangoKiller/MolTC • • 6 Feb 2024

Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research.

Relational Reasoning

126

1.05 stars / hour

Paper
Code

STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases

snap-stanford/stark • • 19 Apr 2024

Answering real-world user queries, such as product search, often requires accurate retrieval of information from semi-structured knowledge bases or databases that involve blend of unstructured (e. g., textual descriptions of products) and structured (e. g., entity relations of products) information.

Benchmarking Retrieval

166

0.91 stars / hour

Paper
Code

Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

FoundationVision/Groma • • 19 Apr 2024

We introduce Groma, a Multimodal Large Language Model (MLLM) with grounded and fine-grained visual perception ability.

Language Modelling Large Language Model +2

276

0.81 stars / hour

Paper
Code

QLoRA: Efficient Finetuning of Quantized LLMs

internlm/xtuner • • NeurIPS 2023

Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99. 3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU.

Chatbot Instruction Following +2

2,360

0.80 stars / hour

Paper
Code

AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation

ez-hwh/autocrawler • 19 Apr 2024

We propose AutoCrawler, a two-stage framework that leverages the hierarchical structure of HTML for progressive understanding.

Action Generation

256

0.77 stars / hour

Paper
Code

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

FoundationVision/VAR • • 3 Apr 2024

We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction".

Ranked #7 on Image Generation on ImageNet 256x256

Image Generation Language Modelling +2

3,194

0.72 stars / hour

Paper
Code

PuLID: Pure and Lightning ID Customization via Contrastive Alignment

tothebeginning/pulid • • 24 Apr 2024

We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation.

Text-to-Image Generation

308

0.71 stars / hour

Paper
Code

MultiBooth: Towards Generating All Your Concepts in an Image from Text

chenyangzhu1/multibooth • 22 Apr 2024

MultiBooth addresses these issues by dividing the multi-concept generation process into two phases: a single-concept learning phase and a multi-concept integration phase.

Computational Efficiency Image Generation

0.67 stars / hour

Paper
Code

Llama 2: Open Foundation and Fine-Tuned Chat Models

flagalpha/llama2-chinese • • 18 Jul 2023

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.

Ranked #2 on Question Answering on PubChemQA

Arithmetic Reasoning +5

11,398

0.66 stars / hour

Paper
Code