Large-scale high-dimensional nearest neighbor search using flash memory with in-store processing

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)(2015)

引用 5|浏览4
暂无评分
摘要
Modern datasets of importance such as images, videos, protein sequences or text, usually contain very high dimensional information from the search point of view. Nearest neighbor search is one of the most fundamental building blocks in dealing with large amounts of data. It is the problem of finding points in a database that are most similar to a query data point by some distance metric. There is a large body of work in algorithms for nearest-neighbor search on large highdimensional datasets. Since these algorithms invariably involve random access to data, most existing implementations ensure that the data is stored in DRAM, and does not spill into secondary storage such as hard disks. However, the immense size of modern datasets often requires hundreds of computers to accommodate the dataset in DRAM. An alternative to such a system is a much smaller cluster that stores the dataset in flash memory (instead of DRAM) and has in-store computing capability. In this paper, we build and demonstrate the performance of highdimensional nearest-neighbor search on a flash-based system with FPGA acceleration and show that it sometimes exceeds the performance of a DRAM-based solution. We chose two example applications, images and documents, for this demonstration. Since flash storage, in comparison to DRAM, is an order of magnitude cheaper and consumes an order of magnitude less power, a flashbased solution for nearest-neighbor searches offers a viable and attractive alternative.
更多
查看译文
关键词
large-scale high-dimensional nearest neighbor search,flash memory,in-store processing,query data point,distance metric,DRAM,dynamic random access memory,dataset accommodation,flash-based system,FPGA acceleration,field programmable gate array
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要