Automatic Source Code Summarization via Reinforcement Learning

CUHK Course IERG5350 2020 · Zhuangbin Chen ·

Due to the fast development of computer and software, the volume of today's code has reached an unprecedented level. For large-scale systems (e.g., cloud computing systems) with billions lines of codes, the majority of its maintenance effort is code management. And much of this effort is spent on understanding related source codes. With high-quality code summaries, one can quickly understand what a function does (even without reading the code). However, it’s nontrivial for a programmer to write good comments for source codes. If the code summary can be automatically generated, then we can greatly accelerate the whole pipeline of software development. To this end, in this project we develop a reinforcement learning framework to enhance the automatic generation of code summarization. By properly defining the key modules, we employ an actor-critic network to solve this problem. Experimental results demonstrate that our model outperforms vallina sequence to sequence model by a noticeable margin.

PDF Abstract