Meta Attention Networks: Meta-Learning Attention to Modulate Information Between Recurrent Independent Mechanisms
Decomposing knowledge into interchangeable pieces promises a generalization advantage when there are changes in distribution. A learning agent interacting with the environment is likely to be faced with situations requiring novel combinations of existing pieces of knowledge. We hypothesize that such a decomposition of knowledge is particularly relevant for being able to generalize in a systematic way to out of distribution changes. To study these ideas, we propose a particular training framework in which we assume that the pieces of knowledge an agent needs, as well as its reward function are stationary and can be re-used across tasks. The attention mechanisms dynamically select which modules should be adapted, and the parameters of the \textit{selected} modules are changed quickly as a learner is confronted with variations in what it experiences, while the parameters of the attention mechanisms act as slowly changing meta-parameters. We focus on pieces of knowledge captured by an ensemble of modules sparsely communicating with each other via a bottleneck of attention. We find that meta-learning the modular aspects of the proposed system greatly help in achieving faster learning, in experiments with a reinforcement learning setup involving navigation in a partially observed grid world with image-level input. We also find that reversing the role of parameters and meta-parameters does not work nearly as well, suggesting a particular role for fast adaptation of the dynamically selected modules.
PDF Abstract