no code implementations • 21 Jan 2024 • Zhigang Wang, Hangyu Yang, Ning Wang, Chuanfei Xu, Jie Nie, Zhiqiang Wei, Yu Gu, Ge Yu
However, training its complex network is very space-consuming, since a lot of intermediate data are preserved across layers, especially when processing high-dimension inputs with a big batch size.