Exploring Regular Expression Evolution

2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER)(2019)

引用 26|浏览49
暂无评分
摘要
Although there are tools to help developers understand the matching behaviors between a regular expression and a string, regular-expression related faults are still common. Learning developers’ behavior through the change history of regular expressions can identify common edit patterns, which can inform the creation of mutation and repair operators to assist with testing and fixing regular expressions. In this work, we explore how regular expressions evolve over time, focusing on the characteristics of regular expression edits, the syntactic and semantic difference of the edits, and the feature changes of edits. Our exploration uses two datasets. First, we look at GitHub projects that have a regular expression in their current version and look back through the commit logs to collect the regular expressions’ edit history. Second, we collect regular expressions composed by study participants during problem-solving tasks. Our results show that 1) 95% of the regular expressions from GitHub are not edited, 2) most edited regular expressions have a syntactic distance of 4-6 characters from their predecessors, 3) over 50% of the edits in GitHub tend to expand the scope of regular expression, and 4) the number of features used indicates the regular expression language usage increases over time. This work has implications for supporting regular expression repair and mutation to ensure test suite quality.
更多
查看译文
关键词
Semantics,History,Testing,Syntactics,Maintenance engineering,Computer bugs,Tools
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要