代码段
计算机科学
程序理解
程序设计语言
背景(考古学)
源代码
编码(集合论)
页眉
抽象语法树
解析
情报检索
人工智能
软件
软件系统
古生物学
计算机网络
集合(抽象数据类型)
生物
作者
Hanyang Guo,Xiangping Chen,Yuan Huang,Yanlin Wang,Xi Ding,Zibin Zheng,Xiaocong Zhou,Hong‐Ning Dai
摘要
Code commenting plays an important role in program comprehension. Automatic comment generation helps improve software maintenance efficiency. The code comments to annotate a method mainly include header comments and snippet comments. The header comment aims to describe the functionality of the entire method, thereby providing a general comment at the beginning of the method. The snippet comment appears at multiple code segments in the body of a method, where a code segment is called a code snippet. Both of them help developers quickly understand code semantics, thereby improving code readability and code maintainability. However, existing automatic comment generation models mainly focus more on header comments, because there are public datasets to validate the performance. By contrast, it is challenging to collect datasets for snippet comments, because it is difficult to determine their scope. Even worse, code snippets are often too short to capture complete syntax and semantic information. To address this challenge, we propose a novel S nippet C omment Gen eration approach called SCGen . First, we utilize the context of the code snippet to expand the syntax and semantic information. Specifically, 600,243 snippet code-comment pairs are collected from 959 Java projects. Then, we capture variables from code snippets and extract variable-related statements from the context. After that, we devise an algorithm to parse and traverse abstract syntax tree (AST) information of code snippets and corresponding context. Finally, SCGen generates snippet comments after inputting the source code snippet and corresponding AST information into a sequence-to-sequence-based model. We conducted extensive experiments on the dataset we collected to evaluate our SCGen . Our approach obtains 18.23 in BLEU-4 metrics, 18.83 in METEOR, and 23.65 in ROUGE-L, which outperforms state-of-the-art comment generation models.
科研通智能强力驱动
Strongly Powered by AbleSci AI