Mixture of Softmaxes performs $K$ different softmaxes and mixes them. The motivation is that the traditional softmax suffers from a softmax bottleneck, i.e. the expressiveness of the conditional probability we can model is constrained by the combination of a dot product and the softmax. By using a mixture of softmaxes, we can model the conditional probability more expressively.
Source: Breaking the Softmax Bottleneck: A High-Rank RNN Language ModelPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 2 | 22.22% |
Machine Translation | 2 | 22.22% |
Translation | 2 | 22.22% |
Tree Decomposition | 1 | 11.11% |
Image Captioning | 1 | 11.11% |
Text Generation | 1 | 11.11% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |