DM-CT: Consistency Training with Data and Model Perturbation

29 Sep 2021 · Xiaobo Liang, Runze Mao, Lijun Wu, Juntao Li, Weiqing Liu, Qing Li, Min Zhang ·

Consistency training has been widely adopted and shown great promise in deep learning. The common approach of consistency training is performed on the data-level, which typically utilizes the data augmentation strategy (or adversarial training) to make the predictions from the augmented input and the original input to be consistent, so that the model is more robust and attains better generalization ability. Recently, consistency training is also incorporated from the model-level, in which the randomness existed in the model (e.g., dropout) is constrained during the training stage, and the inference model can be more consistent with the training phase. In this work, we investigate these two aspects and propose an integrated framework, DM-CT, that incorporates both the data-level and model-level consistency training. Concretely, the input data is first augmented, and the output distributions of different sub models generated by model variance are forced to be consistent (model-level). Meanwhile, the predictions of the original input and the augmented one are constrained to be consistent (data-level). We study different data augmentation strategies and model variances in the DM-CT framework. Experiments on different tasks, including neural machine translation ($4$ IWSLT14 translation tasks, multilingual translation task, and WMT16 Romanian$\to$English translation), natural language understand (GLUE benchmark), and image classification (CIFAR-100 dataset), well demonstrate the superiority of DM-CT by obtaining significant and consistent performance improvements.

PDF Abstract