Chrome Extension
WeChat Mini Program
Use on ChatGLM

In-RDBMS Hardware Acceleration of Advanced Analytics

Proceedings of the VLDB Endowment(2018)

Georgia Inst Technol | Univ Wisconsin Madison | Univ Calif San Diego

Cited 61|Views147
Abstract
The data revolution is fueled by advances in machine learning, databases, and hardware design. Programmable accelerators are making their way into each of these areas independently. As such, there is a void of solutions that enables hardware acceleration at the intersection of these disjoint fields. This paper sets out to be the initial step towards a unifying solution for in- D atabase A cceleration of Advanced A nalytics (DAnA). Deploying specialized hardware, such as FPGAs, for in-database analytics currently requires hand-designing the hardware and manually routing the data. Instead, DAnA automatically maps a high-level specification of advanced analytics queries to an FPGA accelerator. The accelerator implementation is generated for a User Defined Function (UDF), expressed as a part of an SQL query using a Python-embedded Domain-Specific Language (DSL). To realize an efficient in-database integration, DAnA accelerators contain a novel hardware structure, Striders , that directly interface with the buffer pool of the database. Striders extract, cleanse, and process the training data tuples that are consumed by a multi-threaded FPGA engine that executes the analytics algorithm. We integrate DAnA with PostgreSQL to generate hardware accelerators for a range of real-world and synthetic datasets running diverse ML algorithms. Results show that DAnA-enhanced PostgreSQL provides, on average, 8.3× end-to-end speedup for real datasets, with a maximum of 28.2×. Moreover, DAnA-enhanced PostgreSQL is, on average, 4.0× faster than the multi-threaded Apache MADLib running on Greenplum. DAnA provides these benefits while hiding the complexity of hardware design from data scientists and allowing them to express the algorithm in ≈30-60 lines of Python.
More
Translated text
PDF
Bibtex
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Related Papers

Integrating DBMS and Parallel Data Mining Algorithms for Modern Many-Core Processors.

International Conference on Data Analytics and Management in Data Intensive Domains 2017

被引用26

Polystore++: Accelerated Polystore System for Heterogeneous Workloads.

2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS) 2019

被引用5

Database Meets Artificial Intelligence: A Survey.

IEEE Transactions on Knowledge and Data Engineering 2020

被引用200

Gorgon: Accelerating Machine Learning from Relational Data

2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020) 2020

被引用27

Distributed Learning Systems with First-Order Methods.

Foundations and Trends in Databases 2020

被引用31

Accelerating generalized linear models with MLWeaving

WangZeke, KaraKaan, ZhangHantian, AlonsoGustavo, MutluOnur,ZhangCe
very large data bases 2019

被引用12

Hardware Acceleration for DBMS Machine Learning Scoring: is It Worth the Overheads?

2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) 2021

被引用2

AI Meets Database: AI4DB and DB4AI

SIGMOD '21 PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA 2021

被引用111

TCUDB: Accelerating Database with Tensor Processors

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22) 2022

被引用33

Database parallelism, big data and analytics, deep learning

Alexander Thomasian
Storage Systems 2022

被引用0

Data Management for Machine Learning: A Survey

IEEE transactions on knowledge and data engineering 2022

被引用14

Sailfish: Exploring Heterogeneous Query Acceleration on Discrete CPU-FPGA Architecture

2023 IEEE 39TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS, ICDEW 2023

被引用0

Data Management in Machine Learning Systems

Synthesis lectures on data management 2019

被引用41

Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper
Summary is being generated by the instructions you defined