Tuning for software analytics

Information and Software Technology(2016)

引用 186|浏览76
暂无评分
摘要
Context: Data miners have been widely used in software engineering to, say, generate defect predictors from static code measures. Such static code defect predictors perform well compared to manual methods, and they are easy to use and useful to use. But one of the \"black arts\" of data mining is setting the tunings that control the miner.Objective: We seek simple, automatic, and very effective method for finding those tunings.Method: For each experiment with different data sets (from open source JAVA systems), we ran differential evolution as an optimizer to explore the tuning space (as a first step) then tested the tunings using hold-out data.Results: Contrary to our prior expectations, we found these tunings were remarkably simple: it only required tens, not thousands, of attempts to obtain very good results. For example, when learning software defect predictors, this method can quickly find tunings that alter detection precision from 0% to 60%.Conclusion: Since (1)¿the improvements are so large, and (2)¿the tuning is so simple, we need to change standard methods in software analytics. At least for defect prediction, it is no longer enough to just run a data miner and present the result without conducting a tuning optimization study. The implication for other kinds of analytics is now an open and pressing issue.
更多
查看译文
关键词
Defect prediction,CART,Random forest,Differential evolution,Search-based software engineering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要