AnalyticDB-V: a hybrid analytical engine towards query fusion for structured and unstructured data

Hosted Content(2020)

引用 60|浏览851
暂无评分
摘要
AbstractWith the explosive growth of unstructured data (such as images, videos, and audios), unstructured data analytics is widespread in a rich vein of real-world applications. Many database systems start to incorporate unstructured data analysis to meet such demands. However, queries over unstructured and structured data are often treated as disjoint tasks in most systems, where hybrid queries (i.e., involving both data types) are not yet fully supported.In this paper, we present a hybrid analytic engine developed at Alibaba, named AnalyticDB-V (ADBV), to fulfill such emerging demands. ADBV offers an interface that enables users to express hybrid queries using SQL semantics by converting unstructured data to high dimensional vectors. ADBV adopts the lambda framework and leverages the merits of approximate nearest neighbor search (ANNS) techniques to support hybrid data analytics. Moreover, a novel ANNS algorithm is proposed to improve the accuracy on large-scale vectors representing massive unstructured data. All ANNS algorithms are implemented as physical operators in ADBV, meanwhile, accuracy-aware cost-based optimization techniques are proposed to identify effective execution plans. Experimental results on both public and in-house datasets show the superior performance achieved by ADBV and its effectiveness. ADBV has been successfully deployed on Alibaba Cloud to provide hybrid query processing services for various real-world applications.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要