1 code implementation • 22 Oct 2020 • Aaron Baier-Reinio, Hans De Sterck
We use neural ordinary differential equations to formulate a variant of the Transformer that is depth-adaptive in the sense that an input-dependent number of time steps is taken by the ordinary differential equation solver.