Snippet Comment Generation Based on Code Context Expansion

代码段计算机科学程序理解程序设计语言背景（考古学）源代码编码（集合论）页眉抽象语法树解析情报检索人工智能软件软件系统古生物学计算机网络集合（抽象数据类型）生物

作者

Hanyang Guo,Xiangping Chen,Yuan Huang,Yanlin Wang,Xi Ding,Zibin Zheng,Xiaocong Zhou,Hong‐Ning Dai

出处

期刊：ACM Transactions on Software Engineering and Methodology [Association for Computing Machinery]
日期：2023-07-31 卷期号：33 (1): 1-30

标识

DOI：10.1145/3611664

摘要

Code commenting plays an important role in program comprehension. Automatic comment generation helps improve software maintenance efficiency. The code comments to annotate a method mainly include header comments and snippet comments. The header comment aims to describe the functionality of the entire method, thereby providing a general comment at the beginning of the method. The snippet comment appears at multiple code segments in the body of a method, where a code segment is called a code snippet. Both of them help developers quickly understand code semantics, thereby improving code readability and code maintainability. However, existing automatic comment generation models mainly focus more on header comments, because there are public datasets to validate the performance. By contrast, it is challenging to collect datasets for snippet comments, because it is difficult to determine their scope. Even worse, code snippets are often too short to capture complete syntax and semantic information. To address this challenge, we propose a novel S nippet C omment Gen eration approach called SCGen . First, we utilize the context of the code snippet to expand the syntax and semantic information. Specifically, 600,243 snippet code-comment pairs are collected from 959 Java projects. Then, we capture variables from code snippets and extract variable-related statements from the context. After that, we devise an algorithm to parse and traverse abstract syntax tree (AST) information of code snippets and corresponding context. Finally, SCGen generates snippet comments after inputting the source code snippet and corresponding AST information into a sequence-to-sequence-based model. We conducted extensive experiments on the dataset we collected to evaluate our SCGen . Our approach obtains 18.23 in BLEU-4 metrics, 18.83 in METEOR, and 23.65 in ROUGE-L, which outperforms state-of-the-art comment generation models.

求助该文献

最长约 10秒，即可获得该文献文件

Snippet Comment Generation Based on Code Context Expansion

今日热心研友