Trending Research

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

ibm-granite/granite-code-models • 7 May 2024

Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously.

Code Generation Decoder

786

0.49 stars / hour

Paper
Code

EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training

leaplabthu/efficienttrain • • 14 May 2024

These patterns, when observed through frequency and spatial domains, incorporate lower-frequency components, and the natural image contents without distortion or data augmentation.

Data Augmentation Self-Supervised Learning

105

0.49 stars / hour

Paper
Code

SceneTracker: Long-term Scene Flow Estimation Network

wwsource/scenetracker • • 29 Mar 2024

Considering the complementarity of scene flow estimation in the spatial domain's focusing capability and 3D object tracking in the temporal domain's coherence, this study aims to address a comprehensive new task that can simultaneously capture fine-grained and long-term 3D motion in an online manner: long-term scene flow estimation (LSFE).

3D Object Tracking Object Tracking +1

0.48 stars / hour

Paper
Code

VILA: On Pre-training for Visual Language Models

efficient-large-model/vila • • 12 Dec 2023

Visual language models (VLMs) rapidly progressed with the recent success of large language models.

Ranked #24 on Visual Question Answering on MM-Vet

In-Context Learning Language Modelling +2

744

0.46 stars / hour

Paper
Code

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

tencentarc/instantmesh • • 10 Apr 2024

We present InstantMesh, a feed-forward framework for instant 3D mesh generation from a single image, featuring state-of-the-art generation quality and significant training scalability.

Image to 3D

2,189

0.44 stars / hour

Paper
Code

WavCraft: Audio Editing and Generation with Large Language Models

jinhualiang/wavcraft • • 14 Mar 2024

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing.

In-Context Learning

477

0.41 stars / hour

Paper
Code

Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model

xiaoduoailab/xmodelvlm • • 15 May 2024

We introduce Xmodel-VLM, a cutting-edge multimodal vision language model.

Language Modelling

0.40 stars / hour

Paper
Code

Vidur: A Large-Scale Simulation Framework For LLM Inference

microsoft/vidur • 8 May 2024

Vidur models the performance of LLM operators using a combination of experimental profiling and predictive modeling, and evaluates the end-to-end inference performance for different workloads by estimating several metrics of interest such as latency and throughput.

Scheduling

0.38 stars / hour

Paper
Code

PHUDGE: Phi-3 as Scalable Judge

deshwalmahesh/PHUDGE • • 12 May 2024

In this paper cum technical report, we present PHUDGE A fine tuned Phi3 model that achieved SOTA results in 4 tasks as Feedback Test, Feedback OOD, MT Human, Preference Test surpassing each and every existing model in latency and throughput.

Data Augmentation

0.38 stars / hour

Paper
Code

Improving Diffusion Models for Virtual Try-on

yisol/IDM-VTON • • 8 Mar 2024

Finally, we present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.

Ranked #1 on Virtual Try-on on VITON-HD

Virtual Try-on

2,496

0.37 stars / hour

Paper
Code