MambaOut: Do We Really Need Mamba for Vision?

yuweihao/mambaout 13 May 2024

For vision tasks, as image classification does not align with either characteristic, we hypothesize that Mamba is not necessary for this task; Detection and segmentation tasks are also not autoregressive, yet they adhere to the long-sequence characteristic, so we believe it is still worthwhile to explore Mamba's potential for these tasks.

Image Classification Instance Segmentation +2

1,519
5.77 stars / hour

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

tencent/hunyuandit 14 May 2024

For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images.

Image Generation Language Modelling +2

1,565
4.48 stars / hour

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

idea-research/grounding-dino-1.5-api 16 May 2024

Empirical results demonstrate the effectiveness of Grounding DINO 1. 5, with the Grounding DINO 1. 5 Pro model attaining a 54. 3 AP on the COCO detection benchmark and a 55. 7 AP on the LVIS-minival zero-shot transfer benchmark, setting new records for open-set object detection.

Edge-computing object-detection +1

215
3.07 stars / hour

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

opengvlab/internvl 25 Apr 2024

Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks.

4k Language Modelling +3

2,428
2.41 stars / hour

A decoder-only foundation model for time-series forecasting

google-research/timesfm 14 Oct 2023

Motivated by recent advances in large language models for Natural Language Processing (NLP), we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of public datasets comes close to the accuracy of state-of-the-art supervised forecasting models for each individual dataset.

Decoder Time Series +1

2,234
1.52 stars / hour

How Far Are We From AGI

ulab-uiuc/agi-survey 16 May 2024

The evolution of artificial intelligence (AI) has profoundly impacted human society, driving significant advancements in multiple sectors.

140
1.50 stars / hour

AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding

x-lance/anitalker 6 May 2024

The paper introduces AniTalker, an innovative framework designed to generate lifelike talking faces from a single portrait.

Metric Learning Self-Supervised Learning

879
1.47 stars / hour
210
1.38 stars / hour

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

alpha-vllm/lumina-t2x 9 May 2024

Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details.

1,055
1.22 stars / hour

KAN: Kolmogorov-Arnold Networks

Blealtan/efficient-kan 30 Apr 2024

Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs).

2,576
1.12 stars / hour