Continual Learning for Seq2Seq Generations with Transformer Calibration

ACL ARR January 2022 · Anonymous ·

Conventional NLP generation models are trained offline with a given dataset for a particular task, which is referred to as isolated learning. Research on sequence-to-sequence language generation aims to study continual learning model to constantly learning from sequentially encountered tasks. However, continual learning studies often suffer from catastrophic forgetting, a persistent challenge for lifelong learning. In this paper, we present a novel NLP transformer model which attempts to mitigate catastrophic forgetting in online continual learning from a new perspective, i.e., attention calibration. We model the attention in the transformer as a calibrated unit in a general formulation, where the attention calibration could give benefits to balance the stability and plasticity of continual learning algorithms through influencing both their forward inference path and backward optimization path. Our experiments, paraphrase generation, show that this work outperforms SOTA models by a considerable margin and remedy the forgetting greatly.

PDF Abstract