no code implementations • 17 Sep 2020 • Po Li, Lei LI, Yan Fu, Jun Rong, Yu Zhang
At top of the MoE layer, we deploy a transformer layer for each task as task tower to learn task-specific information.
Recommendation Systems