More practically, we evaluate these models on the task of learning to execute partial programs, as might arise if using the model as a heuristic function in program synthesis.
Human perception is structured around objects which form the basis for our higher-level cognition and impressive systematic generalization abilities.
Here, we present the NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging environment for RL research based on the popular single-player terminal-based roguelike game, NetHack.
Ranked #1 on NetHack Score on NetHack Learning Environment
The recent success of natural language understanding (NLU) systems has been troubled by results highlighting the failure of these models to generalize in a systematic and robust way.
Numerous models for grounded language understanding have been recently proposed, including (i) generic models that can be easily adapted to any given task and (ii) intuitively appealing modular models that require background knowledge to be instantiated.
Humans are remarkably flexible when understanding new sentences that include combinations of concepts they have never encountered before.
In this paper, we introduce a new benchmark, gSCAN, for evaluating compositional generalization in situated language understanding.
In this work, we study how systematic the generalization of such models is, that is to which extent they are capable of handling novel combinations of known linguistic constructs.
However, state-of-the-art models in grounded question answering often do not explicitly perform decomposition, leading to difficulties in generalization to out-of-distribution examples.