Convolutional Tensor-Train LSTM for Long-Term Video Prediction

25 Sep 2019 · Jiahao Su, Wonmin Byeon, Furong Huang, Jan Kautz, Animashree Anandkumar ·

Long-term video prediction is highly challenging since it entails simultaneously capturing spatial and temporal information across a long range of image frames.Standard recurrent models are ineffective since they are prone to error propagation and cannot effectively capture higher-order correlations. A potential solution is to extend to higher-order spatio-temporal recurrent models. However, such a model requires a large number of parameters and operations, making it intractable to learn in practice and is prone to overfitting. In this work, we propose convolutional tensor-train LSTM (Conv-TT-LSTM), which learns higher-orderConvolutional LSTM (ConvLSTM) efficiently using convolutional tensor-train decomposition (CTTD). Our proposed model naturally incorporates higher-order spatio-temporal information at a small cost of memory and computation by using efficient low-rank tensor representations. We evaluate our model on Moving-MNIST and KTH datasets and show improvements over standard ConvLSTM and better/comparable results to other ConvLSTM-based approaches, but with much fewer parameters.

PDF Abstract