Trending Research

SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

ailab-cvc/seed-x • • 22 Apr 2024

We hope that our work will inspire future research into what can be achieved by versatile multimodal foundation models in real-world applications.

Image Generation

114

0.70 stars / hour

Paper
Code

Learning Visuotactile Skills with Two Multifingered Hands

ToruOwO/hato • • 25 Apr 2024

Two significant challenges exist: the lack of an affordable and accessible teleoperation system suitable for a dual-arm setup with multifingered hands, and the scarcity of multifingered hand hardware equipped with touch sensing.

0.80 stars / hour

Paper
Code

AgentScope: A Flexible yet Robust Multi-Agent Platform

modelscope/agentscope • 21 Feb 2024

With the rapid advancement of Large Language Models (LLMs), significant progress has been made in multi-agent applications.

599

0.75 stars / hour

Paper
Code

Rethinking Inductive Biases for Surface Normal Estimation

baegwangbin/dsine • • 1 Mar 2024

Despite the growing demand for accurate surface normal estimation models, existing methods use general-purpose dense prediction models, adopting the same inductive biases as other tasks.

Surface Normal Estimation

498

0.74 stars / hour

Paper
Code

SnapKV: LLM Knows What You are Looking for Before Generation

fasterdecoding/snapkv • • 22 Apr 2024

Specifically, SnapKV achieves a consistent decoding speed with a 3. 6x increase in generation speed and an 8. 2x enhancement in memory efficiency compared to baseline when processing inputs of 16K tokens.

16k

0.74 stars / hour

Paper
Code

Dynamic Generation of Personalities with Large Language Models

hiyouga/llama-factory • • 10 Apr 2024

We propose a new metric to assess personality generation capability based on this evaluation method.

Personality Generation

20,050

0.73 stars / hour

Paper
Code

ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation

hiyouga/llama-efficient-tuning • • 4 Aug 2023

Applying Reinforcement Learning (RL) to sequence generation models enables the direct optimization of long-term rewards (\textit{e. g.,} BLEU and human feedback), but typically requires large-scale sampling over a space of action sequences.

Abstractive Text Summarization Language Modelling +5

20,086

0.73 stars / hour

Paper
Code

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

FoundationVision/VAR • • 3 Apr 2024

We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction".

Ranked #7 on Image Generation on ImageNet 256x256

Image Generation Language Modelling +2

2,984

0.69 stars / hour

Paper
Code

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

opengvlab/internvl • • 25 Apr 2024

Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks.

Ranked #6 on Visual Question Answering on MM-Vet

4k Language Modelling +3

937

0.67 stars / hour

Paper
Code

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

zzxslp/som-llava • • 25 Apr 2024

Set-of-Mark (SoM) Prompting unleashes the visual grounding capability of GPT-4V, by enabling the model to associate visual objects with tags inserted on the image.

Ranked #47 on Visual Question Answering on MM-Vet

Visual Grounding Visual Question Answering +1

0.65 stars / hour

Paper
Code