TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Linguistic Acceptability	CoLA	ELC-BERT-base 98M	Accuracy	82.6	# 7
Linguistic Acceptability	CoLA	ELC-BERT-small 24M	Accuracy	76.1	# 11
Linguistic Acceptability	CoLA	LTG-BERT-small 24M	Accuracy	77.6	# 10
Linguistic Acceptability	CoLA	LTG-BERT-base 98M	Accuracy	82.7	# 6
Natural Language Inference	MultiNLI	ELC-BERT-small 24M	Matched	79.2	# 41
Natural Language Inference	MultiNLI	ELC-BERT-small 24M	Mismatched	79.9	# 31
Natural Language Inference	MultiNLI	LTG-BERT-small 24M	Matched	78	# 42
Natural Language Inference	MultiNLI	LTG-BERT-small 24M	Mismatched	78.8	# 32
Natural Language Inference	MultiNLI	LTG-BERT-base 98M	Matched	83	# 34
Natural Language Inference	MultiNLI	LTG-BERT-base 98M	Mismatched	83.4	# 22
Natural Language Inference	MultiNLI	ELC-BERT-base 98M (zero init)	Matched	84.4	# 30
Natural Language Inference	MultiNLI	ELC-BERT-base 98M (zero init)	Mismatched	84.5	# 19
Natural Language Inference	RTE	ELC-BERT-small 24M	Accuracy	55.4	# 82
Natural Language Inference	RTE	ELC-BERT-base 98M (zero init)	Accuracy	63	# 67
Natural Language Inference	RTE	LTG-BERT-base 98M	Accuracy	54.7	# 84
Natural Language Inference	RTE	LTG-BERT-small 24M	Accuracy	53.7	# 87

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/not-all-layers-are-equally-as-important-every/linguistic-acceptability-on-cola)](https://paperswithcode.com/sota/linguistic-acceptability-on-cola?p=not-all-layers-are-equally-as-important-every)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/not-all-layers-are-equally-as-important-every/natural-language-inference-on-multinli)](https://paperswithcode.com/sota/natural-language-inference-on-multinli?p=not-all-layers-are-equally-as-important-every)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/not-all-layers-are-equally-as-important-every/natural-language-inference-on-rte)](https://paperswithcode.com/sota/natural-language-inference-on-rte?p=not-all-layers-are-equally-as-important-every)`

Not all layers are equally as important: Every Layer Counts BERT

3 Nov 2023 · Lucas Georges Gabriel Charpentier, David Samuel ·

This paper introduces a novel modification of the transformer architecture, tailored for the data-efficient pretraining of language models. This aspect is evaluated by participating in the BabyLM challenge, where our solution won both the strict and strict-small tracks. Our approach allows each transformer layer to select which outputs of previous layers to process. The empirical results verify the potential of this simple modification and show that not all layers are equally as important.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Linguistic Acceptability

Natural Language Inference

Datasets

GLUE

MultiNLI

CoLA

SuperGLUE

BLiMP RTE

Results from the Paper

Add Remove

Ranked #6 on Linguistic Acceptability on CoLA

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Linguistic Acceptability	CoLA	ELC-BERT-base 98M	Accuracy	82.6	# 7	Compare
Linguistic Acceptability	CoLA	ELC-BERT-small 24M	Accuracy	76.1	# 11	Compare
Linguistic Acceptability	CoLA	LTG-BERT-small 24M	Accuracy	77.6	# 10	Compare
Linguistic Acceptability	CoLA	LTG-BERT-base 98M	Accuracy	82.7	# 6	Compare
Natural Language Inference	MultiNLI	ELC-BERT-small 24M	Matched	79.2	# 41	Compare
Natural Language Inference	MultiNLI	ELC-BERT-small 24M	Mismatched	79.9	# 31	Compare
Natural Language Inference	MultiNLI	LTG-BERT-small 24M	Matched	78	# 42	Compare
Natural Language Inference	MultiNLI	LTG-BERT-small 24M	Mismatched	78.8	# 32	Compare
Natural Language Inference	MultiNLI	LTG-BERT-base 98M	Matched	83	# 34	Compare
Natural Language Inference	MultiNLI	LTG-BERT-base 98M	Mismatched	83.4	# 22	Compare
Natural Language Inference	MultiNLI	ELC-BERT-base 98M (zero init)	Matched	84.4	# 30	Compare
Natural Language Inference	MultiNLI	ELC-BERT-base 98M (zero init)	Mismatched	84.5	# 19	Compare
Natural Language Inference	RTE	ELC-BERT-small 24M	Accuracy	55.4	# 82	Compare
Natural Language Inference	RTE	ELC-BERT-base 98M (zero init)	Accuracy	63	# 67	Compare
Natural Language Inference	RTE	LTG-BERT-base 98M	Accuracy	54.7	# 84	Compare
Natural Language Inference	RTE	LTG-BERT-small 24M	Accuracy	53.7	# 87	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Not all layers are equally as important: Every Layer Counts BERT

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove