Characterizing Leveraged Stack Overflow Posts

2019 19th International Working Conference on Source Code Analysis and Manipulation (SCAM)(2019)

引用 3|浏览88
暂无评分
摘要
Stack Overflow is the most popular question and answer website on computer programming with more than 2.5M users, 16M questions, and a new answer posted, on average, every five seconds. This wide availability of data led researchers to develop techniques to mine Stack Overflow posts. The aim is to find and recommend posts with information useful to developers. However, and not surprisingly, not every Stack Overflow post is useful from a developer's perspective. We empirically investigate what the characteristics of "useful" Stack Overflow posts are. The underlying assumption of our study is that posts that were used (referenced in the source code) in the past by developers are likely to be useful. We refer to these posts as leveraged posts. We study the characteristics of leveraged posts as opposed to the non-leveraged ones, focusing on community aspects (e.g., the reputation of the user who authored the post), the quality of the included code snippets (e.g., complexity), and the quality of the post's textual content (e.g., readability). Then, we use these features to build a prediction model to automatically identify posts that are likely to be leveraged by developers. Results of the study indicate that post meta-data (e.g., the number of comments received by the answer) is particularly useful to predict whether it has been leveraged or not, whereas code readability appears to be less useful. A classifier can classify leveraged posts with a precision of 65% and recall of 49% and non-leveraged ones with a precision of 95% and recall of 97%. This opens the road towards an automatic identification of "high-quality content" in Stack Overflow.
更多
查看译文
关键词
Recommender Systems,Q&A Forums,Reuse
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要