A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

27 Nov 2022  ·  Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam ·

We propose an efficient design of Transformer-based models for multivariate time series forecasting and self-supervised representation learning. It is based on two key components: (i) segmentation of time series into subseries-level patches which are served as input tokens to Transformer; (ii) channel-independence where each channel contains a single univariate time series that shares the same embedding and Transformer weights across all the series. Patching design naturally has three-fold benefit: local semantic information is retained in the embedding; computation and memory usage of the attention maps are quadratically reduced given the same look-back window; and the model can attend longer history. Our channel-independent patch time series Transformer (PatchTST) can improve the long-term forecasting accuracy significantly when compared with that of SOTA Transformer-based models. We also apply our model to self-supervised pre-training tasks and attain excellent fine-tuning performance, which outperforms supervised training on large datasets. Transferring of masked pre-trained representation on one dataset to others also produces SOTA forecasting accuracy. Code is available at: https://github.com/yuqinie98/PatchTST.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Time Series Forecasting Electricity (192) PatchTST/64 MSE 0.147 # 1
Time Series Forecasting Electricity (336) PatchTST/64 MSE 0.163 # 2
Time Series Forecasting Electricity (720) PatchTST/64 MSE 0.197 # 3
Time Series Forecasting Electricity (96) PatchTST/64 MSE 0.129 # 1
Time Series Forecasting ETTh1 (192) Multivariate PatchTST/64 MSE 0.413 # 8
MAE 0.429 # 1
Time Series Forecasting ETTh1 (192) Univariate PatchTST/64 MSE 0.074 # 5
MAE 0.215 # 1
Time Series Forecasting ETTh1 (336) Multivariate PatchTST/64 MSE 0.422 # 3
MAE 0.44 # 7
Time Series Forecasting ETTh1 (336) Univariate PatchTST/64 MSE 0.076 # 2
MAE 0.22 # 8
Time Series Forecasting ETTh1 (720) Multivariate PatchTST/64 MSE 0.447 # 4
MAE 0.468 # 7
Time Series Forecasting ETTh1 (720) Univariate PatchTST/64 MSE 0.087 # 4
MAE 0.236 # 9
Time Series Forecasting ETTh1 (96) Multivariate PatchTST/64 MSE 0.37 # 3
MAE 0.4 # 1
Time Series Forecasting ETTh1 (96) Univariate PatchTST/64 MSE 0.059 # 6
MAE 0.189 # 1
Time Series Forecasting ETTh2 (192) Multivariate PatchTST/64 MSE 0.341 # 5
MAE 0.382 # 3
Time Series Forecasting ETTh2 (192) Univariate PatchTST/64 MSE 0.171 # 4
MAE 0.329 # 2
Time Series Forecasting ETTh2 (336) Multivariate PatchTST/64 MSE 0.329 # 3
MAE 0.384 # 9
Time Series Forecasting ETTh2 (336) Univariate PatchTST/64 MSE 0.171 # 4
MAE 0.336 # 7
Time Series Forecasting ETTh2 (720) Multivariate PatchTST/64 MSE 0.379 # 1
MAE 0.422 # 11
Time Series Forecasting ETTh2 (720) Univariate PatchTST/64 MSE 0.223 # 5
MAE 0.38 # 5
Time Series Forecasting ETTh2 (96) Multivariate PatchTST/64 MSE 0.274 # 5
MAE 0.337 # 4
Time Series Forecasting ETTh2 (96) Univariate PatchTST/64 MSE 0.131 # 5
MAE 0.284 # 1
Time Series Forecasting Weather (192) PatchTST/64 MSE 0.194 # 4
Time Series Forecasting Weather (336) PatchTST/64 MSE 0.245 # 3
Time Series Forecasting Weather (720) PatchTST/64 MSE 0.314 # 2
Time Series Forecasting Weather (96) PatchTST/64 MSE 0.149 # 4

Methods