Java
计算机科学
测试套件
水准点(测量)
一套
软件错误
软件工程
实施
实证研究
软件
比例(比率)
测试用例
程序设计语言
机器学习
物理
地理
考古
回归分析
哲学
认识论
历史
量子力学
大地测量学
作者
Thomas Durieux,Fernanda Madeiral,Matías Martínez,Rui Abreu
标识
DOI:10.1145/3338906.3338911
摘要
In the past decade, research on test-suite-based automatic program repair has grown significantly. Each year, new approaches and implementations are featured in major software engineering venues. However, most of those approaches are evaluated on a single benchmark of bugs, which are also rarely reproduced by other researchers. In this paper, we present a large-scale experiment using 11 Java test-suite-based repair tools and 2,141 bugs from 5 benchmarks. Our goal is to have a better understanding of the current state of automatic program repair tools on a large diversity of benchmarks. Our investigation is guided by the hypothesis that the repairability of repair tools might not be generalized across different benchmarks. We found that the 11 tools 1) are able to generate patches for 21% of the bugs from the 5 benchmarks, and 2) have better performance on Defects4J compared to other benchmarks, by generating patches for 47% of the bugs from Defects4J compared to 10-30% of bugs from the other benchmarks. Our experiment comprises 23,551 repair attempts, which we used to find causes of non-patch generation. These causes are reported in this paper, which can help repair tool designers to improve their approaches and tools.
科研通智能强力驱动
Strongly Powered by AbleSci AI