LayerDrop is a form of structured dropout for Transformer models which has a regularization effect during training and allows for efficient pruning at inference time. It randomly drops layers from the Transformer according to an "every other" strategy where pruning with a rate $p$ means dropping the layers at depth $d$ such that $d = 0\left(\text{mod}\left(\text{floor}\left(\frac{1}{p}\right)\right)\right)$.
Source: Reducing Transformer Depth on Demand with Structured DropoutPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 2 | 18.18% |
Machine Translation | 2 | 18.18% |
Translation | 2 | 18.18% |
Cross-Modal Retrieval | 1 | 9.09% |
Retrieval | 1 | 9.09% |
Multi-Task Learning | 1 | 9.09% |
Open-Domain Question Answering | 1 | 9.09% |
Question Answering | 1 | 9.09% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |