素描
计算机科学
编码(集合论)
代码段
代码生成
源代码
代码重用
重新使用
草图识别
人工智能
冗余代码
程序设计语言
情报检索
自然语言处理
钥匙(锁)
算法
软件
生态学
手势识别
手势
计算机安全
集合(抽象数据类型)
生物
作者
Jia Li,Yongmin Li,Ge Li,Zhi Jin,Yiyang Hao,Xing Hu
标识
DOI:10.1109/icse48619.2023.00179
摘要
Recently, deep learning techniques have shown great success in automatic code generation. Inspired by the code reuse, some researchers propose copy-based approaches that can copy the content from similar code snippets to obtain better performance. Practically, human developers recognize the content in the similar code that is relevant to their needs, which can be viewed as a code sketch. The sketch is further edited to the desired code. However, existing copy-based approaches ignore the code sketches and tend to repeat the similar code without necessary modifications, which leads to generating wrong results. In this paper, we propose a sketch-based code generation approach named Skcoderto mimic developers' code reuse behavior. Given a natural language requirement, Skcoderretrieves a similar code snippet, extracts relevant parts as a code sketch, and edits the sketch into the desired code. Our motivations are that the extracted sketch provides a well-formed pattern for telling models "how to write". The post-editing further adds requirement-specific details into the sketch and outputs the complete code. We conduct experiments on two public datasets and a new dataset collected by this work. We compare our approach to 20 baselines using 5 widely used metrics. Experimental results show that (1) Skcodercan generate more correct programs, and outperforms the state-of-the-art -CodeT5-base by 30.30%, 35.39%, and 29.62% on three datasets. (2) Our approach is effective to multiple code generation models and improves them by up to 120.1% in Pass@l. (3) We investigate three plausible code sketches and discuss the importance of sketches. (4) We manually evaluate the generated code and prove the superiority of our Skcoderin three aspects.
科研通智能强力驱动
Strongly Powered by AbleSci AI