Insights from open source software supply chains (keynote).
ESEC/SIGSOFT FSE(2019)
摘要
Open Source Software (OSS) forms an infrastructure on which numerous (often critical) software applications are based. Substantial research was done to investigate central projects such as Linux kernel but we have only a limited understanding of how the periphery of the larger OSS ecosystem is interconnected through technical dependencies, code sharing, and knowledge flows. We aim to close this gap by a) creating a nearly complete and rapidly updateable collection of version control data for FLOSS projects; b) by cleaning, correcting, and augmenting the data to measure several types of dependencies among code, developers, and projects; c) by creating models that rely on the resulting supply chains to investigate structural and dynamic properties of the entire OSS. The current implementation is capable of being updated each month, occupies over 300Tb of disk space with 1.5B commits and 12B git objects. Highly accurate algorithms to correct identity data and extract dependencies from the source code are used to characterize the current structure of OSS and the way it has evolved. In particular, models of technology spread demonstrate the implicit factors developers use when choosing software components. We expect the resulting research platform will both spur investigations on how the huge periphery in OSS both sustains and is sustained by the central OSS projects and, as a result, will increase resiliency and effectiveness of the OSS.
更多查看译文
关键词
Empirical Software Engineering,Software Ecosystems,Software Measurement
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络