Summarizing Source Code from Structure and Context

IEEE International Joint Conference on Neural Network (IJCNN)(2022)

引用 1|浏览34
Modern software developers tend to engage in social coding platforms to reuse code snippets to expedite the development process, while the codes on such platforms are often suffering from comments being mismatched, missing or outdated. This puts the code search and comprehension in difficulty, and increases the burden of maintenance for software building upon these codes. As summarizing code is beneficial yet it is very expensive for manual operation, in this paper, we elaborate an automatic and effective code summarization paradigm to address this laborious challenge. We represent a given code snippet as an abstract syntax tree (AST), and generate a set of compositional root-to-leaf paths to make the AST accessible regarding code context and structure in a less complex yet expressive way. Accordingly, we design a tree-based transformer model, called TreeXFMR, on these paths to summarize source code in a hierarchical attention operation. This yields two advantages on code representation learning: (1) attention mechanisms at token- and path-level attend the semantics and interactions of source code from different aspects; (2) bi-level positional encodings introduced reveal the intra- and inter-path structure of AST and improve the unambiguity of the representations. During decoding, TreeXFMR attends such learned representations to produce each output of natural language word. We further pre-train the transformer to achieve faster and better training convergence results. Extensive experiments on the code collection from GitHub demonstrate the effectiveness of TreeXFMR, which significantly outperforms state-of-the-art baselines.
code summarization,abstract syntax tree,natural machine translation,transformer
AI 理解论文
Chat Paper