no code implementations • 4 Oct 2022 • Omri Raccah, Phoebe Chen, Ted L. Willke, David Poeppel, Vy A. Vo
The computational complexity of the self-attention mechanism in Transformer models significantly limits their ability to generalize over long temporal durations.
no code implementations • 12 May 2021 • Hsiang-Yun Sherry Chien, Javier S. Turek, Nicole Beckage, Vy A. Vo, Christopher J. Honey, Ted L. Willke
Altogether, we found that LSTM with the proposed forget gate can learn long-term dependencies, outperforming other recurrent networks in multiple domains; such gating mechanism can be integrated into other architectures for improving the learning of long timescale information in recurrent neural networks.