Galactica

Introduced by Taylor et al. in Galactica: A Large Language Model for Science

Galactica is a language model which uses a Transformer architecture in a decoder-only setup with the following modifications:

It uses GeLU activations on all model sizes
It uses a 2048 length context window for all model sizes
It does not use biases in any of the dense kernels or layer norms
It uses learned positional embeddings for the model
A vocabulary of 50k tokens was constructed using BPE. The vocabulary was generated from a randomly selected 2% subset of the training data

Source: Galactica: A Large Language Model for Science

Read Paper See Code

Paper	Code	Results	Date	Stars

This feature is experimental; we are continuously improving our matching algorithm.

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign