Simplifying Access to Large-scale Structured Datasets by Meta-Profiling with Scalable Training Set Enrichment

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22)(2022)

引用 2|浏览6
暂无评分
摘要
Accessing large-scale structured datasets such as WDC [19], having millions of tables coming from hundreds of thousands of sources is very challenging [7, 8, 16]. Even if one topic (e.g. Job postings) is of interest, Jobs tables in different sources have hundreds of different schemas, which significantly complicates both finding and querying them. Here we demonstrate our scalable Meta-data profiler, capable of constructing a standardized interface to a topic of interest in large-scale structured datasets using Deep-Learning and our new unsupervised, scalable training set enrichment algorithm. This interface, called Meta-profile represents a meta-data summary per each topic, representative of the entire dataset. It helps data scientists and end users get access to all relevant topical tables, even in ultra large-scale datasets such as WDC, which would be very difficult or impossible otherwise [25].
更多
查看译文
关键词
Profiling, Heterogeneous Structured Data, Deep Learning, Embeddings
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要