Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

openai/grok 6 Jan 2022

In this paper we propose to study generalization of neural networks on small algorithmically generated datasets.

Memorization

2,046
7.17 stars / hour

LLM4Decompile: Decompiling Binary Code with Large Language Models

albertan017/LLM4Decompile 8 Mar 2024

Therefore, we release the first open-access decompilation LLMs ranging from 1B to 33B pre-trained on 4 billion tokens of C source code and the corresponding assembly code.

1,294
4.96 stars / hour

Chronos: Learning the Language of Time Series

amazon-science/chronos-forecasting 12 Mar 2024

We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models.

Gaussian Processes Language Modelling +2

648
2.79 stars / hour

Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts

mayuelala/followyourclick 13 Mar 2024

Despite recent advances in image-to-video generation, better controllability and local animation are less explored.

Image Animation Image to Video Generation

508
2.76 stars / hour

Fast Inner-Product Algorithms and Architectures for Deep Neural Network Accelerators

trevorpogue/algebraic-nnhw 20 Nov 2023

We introduce a new algorithm called the Free-pipeline Fast Inner Product (FFIP) and its hardware architecture that improve an under-explored fast inner-product algorithm (FIP) proposed by Winograd in 1968.

228
1.90 stars / hour

DeepSeek-VL: Towards Real-World Vision-Language Understanding

deepseek-ai/deepseek-vl 8 Mar 2024

The DeepSeek-VL family (both 1. 3B and 7B models) showcases superior user experiences as a vision-language chatbot in real-world applications, achieving state-of-the-art or competitive performance across a wide range of visual-language benchmarks at the same model size while maintaining robust performance on language-centric benchmarks.

Chatbot Language Modelling +3

1,245
1.81 stars / hour

GiT: Towards Generalist Vision Transformer through Universal Language Interface

haiyang-w/git 14 Mar 2024

Due to its simple design, this paradigm holds promise for narrowing the architectural gap between vision and language.

Language Modelling

155
1.57 stars / hour

StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control

ironjr/streammultidiffusion 14 Mar 2024

The enormous success of diffusion models in text-to-image synthesis has made them promising candidates for the next generation of end-user applications for image generation and editing.

Text-to-Image Generation

124
1.43 stars / hour

GSPMD: General and Scalable Parallelization for ML Computation Graphs

apple/axlearn 10 May 2021

We present GSPMD, an automatic, compiler-based parallelization system for common machine learning computations.

Playing the Game of 2048

804
1.33 stars / hour

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small

openai/transformer-debugger 1 Nov 2022

Research in mechanistic interpretability seeks to explain behaviors of machine learning models in terms of their internal components.

Language Modelling

3,218
1.25 stars / hour