Unsupervised Visual Program Induction with Function Modularization

29 Sep 2021 · Xuguang Duan, Xin Wang, Ziwei Zhang, Wenwu Zhu ·

Program induction serves as one way to analog the ability of human thinking. However, existing methods could only tackle the task under simple scenarios (Fig~\ref{fig:task_examples}(a),(b)). When it comes to complex scenes, e.g., the visual scenes, current program induction methods fail due to the huge program action space. In this paper, to the best of our knowledge, we are the first to tackle this problem. We propose a novel task named {\it unsupervised visual program induction} in complex visual scenes that require complex primitive functions. Solving this task faces two challenges: i) modeling complex primitive functions for complex visual scenes is very difficult, and ii) employing complex functions in the unsupervised program induction suffers from a huge and heterogeneous program action space. To tackle these challenges, we propose the Self-Exploratory-Modularized-Function (SEMF) model, which can jointly model individual function selection and its parameters through a unified modular block. Moreover, a Monto-Carlo-Tree-Search (MCTS) based Self-Exploratory algorithm is proposed to explore program space with modularized function as prior. The exploratory results, in turn, guide the training of these modularized functions. Our extensive experiments demonstrate that the proposed SEFM model outperforms all the existing baselines in model performance, training efficiency, and model generalization ability.

PDF Abstract