编码(集合论)
计算机科学
人工智能
程序设计语言
集合(抽象数据类型)
作者
Qianwen Gou,Yunwei Dong,Wu Yujiao,Qiao Ke
标识
DOI:10.1016/j.jss.2024.111982
摘要
Retrieval-augmented code generation strengthens the generation model by using a retrieval model to select relevant code snippets from a code corpus. The synergy between retrieval and generation ensures that the generated code closely corresponds to the intended functionality. Existing methods simply feed the retrieved results to the generation model. However, if the retrieval corpus contains erroneous or sub-optimal code examples, there is a risk that the model may replicate these mistakes in the generated code. To tackle these problems, we propose RRGcode(Retrieval, Re-ranking, and Generation for code generation), a deep hierarchical search-based code generation framework that fine-tunes initial retrieved code rankings, reducing the risk of replicating errors from the retrieval corpus and enhancing the generation of higher-quality, more reliable code. Specifically, it first retrieves relevant code candidates from a large code corpus. Then, a re-ranking model reconstructs the search space through a detailed semantic comparison between code candidates and the query, ensuring that only the most relevant and accurate candidates are considered. Finally, the re-ranked top-K codes, along with the query, serve as input for the code generation model. Extensive experiments are conducted to evaluate the effectiveness of generated code by RRGcode, demonstrating state-of-the-art performance in code generation tasks.
科研通智能强力驱动
Strongly Powered by AbleSci AI