As easy as APC: overcoming missing data and class imbalance in time series with self-supervised learning

29 Jun 2021  ยท  Fiorella Wever, T. Anderson Keller, Laura Symul, Victor Garcia ยท

High levels of missing data and strong class imbalance are ubiquitous challenges that are often presented simultaneously in real-world time series data. Existing methods approach these problems separately, frequently making significant assumptions about the underlying data generation process in order to lessen the impact of missing information. In this work, we instead demonstrate how a general self-supervised training method, namely Autoregressive Predictive Coding (APC), can be leveraged to overcome both missing data and class imbalance simultaneously without strong assumptions. Specifically, on a synthetic dataset, we show that standard baselines are substantially improved upon through the use of APC, yielding the greatest gains in the combined setting of high missingness and severe class imbalance. We further apply APC on two real-world medical time-series datasets, and show that APC improves the classification performance in all settings, ultimately achieving state-of-the-art AUPRC results on the Physionet benchmark.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Time Series Classification PhysioNet Challenge 2012 GRU-D - APC (n = 1) AUPRC 55.1 # 1
Time Series Classification PhysioNet Challenge 2012 GRU AUPRC 51.4 # 8
Time Series Classification PhysioNet Challenge 2012 GRU-Simple AUPRC 53.8 # 2
Time Series Classification PhysioNet Challenge 2012 GRU-D AUPRC 53.1 # 6
Time Series Analysis PhysioNet Challenge 2012 GRU-D - APC (n = 1) F1 27.3 # 2
Time Series Analysis PhysioNet Challenge 2012 GRU-APC (n = 1) F1 25.7 # 3
Time Series Analysis PhysioNet Challenge 2012 GRU-D F1 22.5 # 4
Time Series Analysis PhysioNet Challenge 2012 GRU-Simple F1 22.2 # 6
Time Series Analysis PhysioNet Challenge 2012 GRU-Mean F1 22.1 # 7
Time Series Analysis PhysioNet Challenge 2012 GRU F1 22.3 # 5
Time Series Analysis PhysioNet Challenge 2012 naive classifier F1 87.47 # 1
Time Series Classification PhysioNet Challenge 2012 GRU-APC (n = 1) AUPRC 53.5 # 4
Time Series Classification PhysioNet Challenge 2012 GRU-D - APC (n = 0) AUPRC 53.3 # 5
Time Series Classification PhysioNet Challenge 2012 GRU-APC (n = 0) AUPRC 50.4 # 9
Time Series Classification PhysioNet Challenge 2012 GRU-Forward AUPRC 52 # 7
Time Series Classification PhysioNet Challenge 2012 GRU-Mean AUPRC 50.3 # 10
Time Series Classification PhysioNet Challenge 2012 BRITS [4] AUROC 0.85 # 2
Time Series Classification PhysioNet Challenge 2012 GRU-D [4] AUROC 0.834 # 4
Time Series Classification PhysioNet Challenge 2012 GRU-D [12] AUROC 0.863 # 1
AUPRC 53.7 # 3
Time Series Classification PhysioNet Challenge 2012 GRU-D [6] AUROC 0.8424 # 3

Methods