Flashrelate: Extracting Relational Data From Semi-Structured Spreadsheets Using Examples

PLDI(2015)

引用 160|浏览185
暂无评分
摘要
With hundreds of millions of users, spreadsheets are one of the most important end-user applications. Spreadsheets are easy to use and allow users great flexibility in storing data. This flexibility comes at a price: users often treat spreadsheets as a poor man's database, leading to creative solutions for storing high-dimensional data. The trouble arises when users need to answer queries with their data. Data manipulation tools make strong assumptions about data layouts and cannot read these ad-hoc databases. Converting data into the appropriate layout requires programming skills or a major investment in manual reformatting. The effect is that a vast amount of real-world data is "locked-in" to a proliferation of one-off formats.We introduce FLASHRELATE, a synthesis engine that lets ordinary users extract structured relational data from spreadsheets without programming. Instead, users extract data by supplying examples of output relational tuples. FLASHRELATE uses these examples to synthesize a program in FLARE. FLARE is a novel extraction language that extends regular expressions with geometric constructs. An interactive user interface on top of FLASHRELATE lets end users extract data by point-and-click. We demonstrate that correct FLARE programs can be synthesized in seconds from a small set of examples for 43 real-world scenarios. Finally, our case study demonstrates FLASHRELATE's usefulness addressing the widespread problem of data trapped in corporate and government formats.
更多
查看译文
关键词
spreadsheets,relational data,data extraction,program synthesis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要