计算机科学
代码段
源代码
编码(集合论)
语法
程序设计语言
自然语言处理
抽象语法树
人工智能
KPI驱动的代码分析
目标代码
语义学(计算机科学)
抽象语法
精确性和召回率
代码评审
情报检索
静态程序分析
代码生成
软件
软件开发
钥匙(锁)
集合(抽象数据类型)
计算机安全
作者
Yuan Huang,Xinyu Hu,Nan Jia,Xiangping Chen,Zibin Zheng,Xiapu Luo
标识
DOI:10.1016/j.jss.2020.110754
摘要
Existing techniques for automatic code commenting assume that the code snippet to be commented has been identified, thus requiring users to provide the code snippet in advance. A smarter commenting approach is desired to first self-determine where to comment in a given source code and then generate comments for the code snippets that need comments. To achieve the first step of this goal, we propose a novel method, CommtPst, to automatically find the appropriate commenting positions in the source code. Since commenting is closely related to the code syntax and semantics, we adopt neural language model (word embeddings) to capture the code semantic information, and analyze the abstract syntax trees to capture code syntactic information. Then, we employ LSTM (long short term memory) to model the long-term logical dependency of code statements over the fused semantic and syntactic information and learn the commenting patterns on the code sequence. We evaluated CommtPst using large data sets from dozens of open-source software systems in GitHub. The experimental results show that the precision, recall and F-Measure values achieved by CommtPst are 0.792, 0.602 and 0.684, respectively, which outperforms the traditional machine learning method with 11.4% improvement on F-measure.
科研通智能强力驱动
Strongly Powered by AbleSci AI