SemanticCloneBench: A Semantic Code Clone Benchmark using Crowd-Source Knowledge

Farouq Al-Omari,Chanchal K. Roy,Tonghao Chen

2020 IEEE 14th International Workshop on Software Clones (IWSC)（2020）

引用 8|浏览52

暂无评分

摘要

Not only do newly proposed code clone detection techniques, but existing techniques and tools also need to be evaluated and compared. This evaluation process could be done by assessing the reported clones manually or by using benchmarks. The main limitations of available benchmarks include: they are restricted to one programming language; they have a limited number of clone pairs that are confined within the selected system(s); they require manual validation; they do not support all types of code clones. To overcome these limitations, we proposed a methodology to generate a wide range of semantic clone benchmark(s) for different programming languages with minimal human validation. Our technique is based on the knowledge provided by developers who participate in the crowd-sourced information website, Stack Overflow. We applied automatic filtering, selection and validation to the source code in Stack Overflow answers. Finally, we build a semantic code clone benchmark of 4000 clones pairs for the languages Java, C, C# and Python.

查看译文

关键词

Semantic clone,Functional equivalent,Stack Overflow,Benchmark

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要