SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

ailab-cvc/seed-x 22 Apr 2024

We hope that our work will inspire future research into what can be achieved by versatile multimodal foundation models in real-world applications.

Image Generation

114
0.70 stars / hour

Learning Visuotactile Skills with Two Multifingered Hands

ToruOwO/hato 25 Apr 2024

Two significant challenges exist: the lack of an affordable and accessible teleoperation system suitable for a dual-arm setup with multifingered hands, and the scarcity of multifingered hand hardware equipped with touch sensing.

32
0.80 stars / hour

AgentScope: A Flexible yet Robust Multi-Agent Platform

modelscope/agentscope 21 Feb 2024

With the rapid advancement of Large Language Models (LLMs), significant progress has been made in multi-agent applications.

599
0.75 stars / hour

Rethinking Inductive Biases for Surface Normal Estimation

baegwangbin/dsine 1 Mar 2024

Despite the growing demand for accurate surface normal estimation models, existing methods use general-purpose dense prediction models, adopting the same inductive biases as other tasks.

Surface Normal Estimation

498
0.74 stars / hour

SnapKV: LLM Knows What You are Looking for Before Generation

fasterdecoding/snapkv 22 Apr 2024

Specifically, SnapKV achieves a consistent decoding speed with a 3. 6x increase in generation speed and an 8. 2x enhancement in memory efficiency compared to baseline when processing inputs of 16K tokens.

16k

83
0.74 stars / hour

Dynamic Generation of Personalities with Large Language Models

hiyouga/llama-factory 10 Apr 2024

We propose a new metric to assess personality generation capability based on this evaluation method.

Personality Generation

20,050
0.73 stars / hour

ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation

hiyouga/llama-efficient-tuning 4 Aug 2023

Applying Reinforcement Learning (RL) to sequence generation models enables the direct optimization of long-term rewards (\textit{e. g.,} BLEU and human feedback), but typically requires large-scale sampling over a space of action sequences.

Abstractive Text Summarization Language Modelling +5

20,086
0.73 stars / hour

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

FoundationVision/VAR 3 Apr 2024

We present Visual AutoRegressive modeling (VAR), a new generation paradigm that redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction".

Image Generation Language Modelling +2

2,984
0.69 stars / hour

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

opengvlab/internvl 25 Apr 2024

Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks.

4k Language Modelling +3

937
0.67 stars / hour

List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

zzxslp/som-llava 25 Apr 2024

Set-of-Mark (SoM) Prompting unleashes the visual grounding capability of GPT-4V, by enabling the model to associate visual objects with tags inserted on the image.

Visual Grounding Visual Question Answering +1

36
0.65 stars / hour