no code implementations • 23 Jan 2024 • Tamir David Hay, Lior Wolf
In the pursuit of reducing the number of trainable parameters in deep transformer networks, we employ Reinforcement Learning to dynamically select layers during training and tie them together.