An Exploratory Study of Functional Redundancy in Code Repositories

2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM)(2017)

引用 12|浏览89
暂无评分
摘要
In large code repositories, the probability of functions to repeat across projects is high. This type of functional redundancy (FR) is desirable for recent code reuse and repair approaches. Yet, FR is hard to measure because it is closely related to program equivalence, which is an undecidable problem. This is one of the reasons most studies that investigate redundancy focus on syntactic rather than semantic replication (e.g., cloning). In this paper we evaluate the extent of FR in a code repository with 68 Java projects taken randomly from SourceForge. Our technique approximates function similarity by first searching for methods that possess similar interfaces (return type, name, and parameter types). We then execute these methods to verify which candidate pairs have matching outputs for a given sample of inputs. Some recent studies have also focused on this type of semantic replication, but our detection approach is generally cheaper and more precise, because it focuses on methods and uses interfaces to reduce the search space. Although our scope is restricted to static methods, which makes our results conservative, our findings are promising. In particular, we found 984 pairs of redundant methods, and 28 out of the 68 (41.17%) projects in the repository presented redundancy. Moreover, the majority of redundant methods for which we had access to the source code did not refer to textual clones (only one redundant method pair referred to replicated code). Our study also indicates that the proposed redundancy detection approach has high precision and is generally inexpensive (only four executions were required per method to attain 100% precision).
更多
查看译文
关键词
functional redundancy,code repository,FR,semantic replication,function similarity,return type,parameter types,Java projects,code reuse,code repair approach,SourceForge,name parameter,redundancy detection approach,replicated code,source code,static methods
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要