LLaMA

Introduced by Touvron et al. in LLaMA: Open and Efficient Foundation Language Models

LLaMA is a collection of foundation language models ranging from 7B to 65B parameters. It is based on the transformer architecture with various improvements that were subsequently proposed. The main difference with the original architecture are listed below.

RMSNorm normalizing function is used to improve the training stability, by normalizing the input of each transformer sub-layer, instead of normalizing the output.
The ReLU non-linearity is replaced by the SwiGLU activation function to improve performance.
Absolute positional embeddings are removed and instead rotary positional embeddings (RoPE) are added at each layer of the network.

Source: LLaMA: Open and Efficient Foundation Language Models

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	99	13.41%
Large Language Model	58	7.86%
Question Answering	34	4.61%
Quantization	26	3.52%
Text Generation	26	3.52%
In-Context Learning	23	3.12%
Instruction Following	22	2.98%
Retrieval	20	2.71%
Code Generation	18	2.44%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Language Models