Multi-query Optimization in a Scan-based Relational Main-memory Table

user-5aceb7ef530c7001b97ba534(2010)

引用 0|浏览2
暂无评分
摘要
In database systems the term multi-query optimization traditionally refers to the practice of finding common sub-expressions between multiple queries, and sharing intermediate results between them. However, in a scan-based relational mainmemory table the task of multi-query optimization is a completely different one. There is just one table, which is continuously scanned and while scanning, a large set of queries is concurrently executed on each record. In such a system, queries are typically small, and sharing intermediate results is difficult. Also it has no indexes on the data. Instead, it builds predicate indexes on the set of active queries. Here, multi-query optimization refers to the practice of finding a set of indexes, such that the cost of concurrently executing these queries is minimized. Such an optimization consists of two parts, estimating the cost of a set of indexes and enumerating various possible sets of indexes. With a precise cost estimation, one can accurately pick the fastest set from the enumerated index sets. Cost estimation requires statistical data about the data in the table. The thesis shows how one can gather and provide statistical data in a scan-based system. For this purpose, data structures and algorithms known from streaming applications are employed. Furthermore, the thesis presents an elaborated cost estimation model based on CPU costs of different index types. Finally, this thesis shows that the optimization problem itself is NP-hard. However, it presents approximation algorithms to find some good index sets.
更多
查看译文
关键词
Query optimization,Optimization problem,Approximation algorithm,Index set,Data structure,Cost estimate,Data mining,Computer science,Predicate (grammar)
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要