Chrome Extension

WeChat Mini Program

Use on ChatGLM

Log in

Academic Profile User Profile

My Following Paper Collections Browse History

In-RDBMS Hardware Acceleration of Advanced Analytics

Divya Mahajan,Joon Kyung Kim,Jacob Sacks,Adel Ardalan,Arun Kumar,Hadi Esmaeilzadeh

Proceedings of the VLDB Endowment（2018）

Georgia Inst Technol | Univ Wisconsin Madison | Univ Calif San Diego

Cited 61|Views147

Abstract

The data revolution is fueled by advances in machine learning, databases, and hardware design. Programmable accelerators are making their way into each of these areas independently. As such, there is a void of solutions that enables hardware acceleration at the intersection of these disjoint fields. This paper sets out to be the initial step towards a unifying solution for in- D atabase A cceleration of Advanced A nalytics (DAnA). Deploying specialized hardware, such as FPGAs, for in-database analytics currently requires hand-designing the hardware and manually routing the data. Instead, DAnA automatically maps a high-level specification of advanced analytics queries to an FPGA accelerator. The accelerator implementation is generated for a User Defined Function (UDF), expressed as a part of an SQL query using a Python-embedded Domain-Specific Language (DSL). To realize an efficient in-database integration, DAnA accelerators contain a novel hardware structure, Striders , that directly interface with the buffer pool of the database. Striders extract, cleanse, and process the training data tuples that are consumed by a multi-threaded FPGA engine that executes the analytics algorithm. We integrate DAnA with PostgreSQL to generate hardware accelerators for a range of real-world and synthetic datasets running diverse ML algorithms. Results show that DAnA-enhanced PostgreSQL provides, on average, 8.3× end-to-end speedup for real datasets, with a maximum of 28.2×. Moreover, DAnA-enhanced PostgreSQL is, on average, 4.0× faster than the multi-threaded Apache MADLib running on Greenplum. DAnA provides these benefits while hiding the complexity of hardware design from data scientists and allowing them to express the algorithm in ≈30-60 lines of Python.

More

Translated text

Bibtex

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Related Papers

Reference papers

Cited Papers

Integrating DBMS and Parallel Data Mining Algorithms for Modern Many-Core Processors.

Timofey Rechkalov,Mikhail Zymbler

International Conference on Data Analytics and Management in Data Intensive Domains 2017

被引用26

Accelerating In-Memory Database Selections Using Latency Masking Hardware Threads

Prerna Budhkar,Ildar Absalyamov,Vasileios Zois,Skyler Windh,Walid A. Najjar,Vassilis J. Tsotras

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION 2019

被引用5

Polystore++: Accelerated Polystore System for Heterogeneous Workloads.

Rekha Singhal,Nathan Zhang,Luigi Nardi,Muhammad Shahbaz,Kunle Olukotun

2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS) 2019

被引用5

Accelerating Raw Data Analysis with the ACCORDA Software and Hardware Architecture

Yuanwei Fang,Chen Zou,Andrew A. Chien

Proceedings of the VLDB Endowment 2019

被引用30

Lowering the Latency of Data Processing Pipelines Through FPGA Based Hardware Acceleration.

Muhsen Owaida,Gustavo Alonso,Laura Fogliarini,Anthony Hock-Koon, Pierre-Etienne Melet

Proceedings of the VLDB Endowment 2019

被引用59

In-memory Database Acceleration on FPGAs: a Survey

Jian Fang,Yvo T. B. Mulder,Jan Hidders,Jinho Lee,H. Peter Hofstee

The VLDB Journal 2020

被引用95

Database Meets Artificial Intelligence: A Survey.

Xuanhe Zhou,Chengliang Chai,Guoliang Li,Ji Sun

IEEE Transactions on Knowledge and Data Engineering 2020

被引用200

Genesis: A Hardware Acceleration Framework for Genomic Data Analysis

Tae Jun Ham,David Bruns-Smith,Brendan Sweeney,Yejin Lee,Seong Hoon Seo,U. Gyeong Song,Young H. Oh,Krste Asanovic,Jae W. Lee,Lisa Wu Wills

2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020) 2020

被引用28

Gorgon: Accelerating Machine Learning from Relational Data

Matthew Vilim,Alexander Rucker,Yaqi Zhang, Sophia Liu,Kunle Olukotun

2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020) 2020

被引用27

Distributed Learning Systems with First-Order Methods.

Ji Liu,Ce Zhang

Foundations and Trends in Databases 2020

被引用31

An Approach to Fuzzy Clustering of Big Data Inside a Parallel Relational DBMS.

Mikhail L. Zymbler,Yana Kraeva,Alexander Grents,Anastasiya Perkova,Sachin Kumar

DAMDID/RCDL 2019

被引用1

Accelerating generalized linear models with MLWeaving

WangZeke, KaraKaan, ZhangHantian, AlonsoGustavo, MutluOnur,ZhangCe

very large data bases 2019

被引用12

Accelerating Data Filtering for Database Using FPGA.

Xuan Sun,Chun Jason Xue,Jinghuan Yu,Tei-Wei Kuo,Xue Liu

Journal of Systems Architecture 2021

被引用9

Accelerating Recommendation System Training by Leveraging Popular Choices

Muhammad Adnan,Yassaman Ebrahimzadeh Maboud,Divya Mahajan,Prashant J. Nair

Proceedings of the VLDB Endowment 2021

被引用12

Hardware Acceleration for DBMS Machine Learning Scoring: is It Worth the Overheads?

Zahra Azad,Rathijit Sen,Kwanghyun Park,Ajay Joshi

2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) 2021

被引用2

AI Meets Database: AI4DB and DB4AI

Guoliang Li,Xuanhe Zhou,Lei Cao

SIGMOD '21 PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA 2021

被引用111

Matrix Profile-Based Approach to Industrial Sensor Data Analysis Inside RDBMS

Mikhail Zymbler,Elena Ivanova

MATHEMATICS 2021

被引用7

TCUDB: Accelerating Database with Tensor Processors

Yu-Ching Hu,Yuliang Li,Hung-Wei Tseng

PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22) 2022

被引用33

Survey on Data Management Technology for Machine Learning

杜小勇,赵哲,崔建伟

Journal of Software 2021

被引用6

Database parallelism, big data and analytics, deep learning

Alexander Thomasian

Storage Systems 2022

被引用0

Exploiting HBM on FPGAs for Data Processing

Runbin Shi,Kaan Kara,Christoph Hagleitner,Dionysios Diamantopoulos,Dimitris Syrivelis,Gustavo Alonso

ACM Transactions on Reconfigurable Technology and Systems 2022

被引用4

Data Management for Machine Learning: A Survey

Chengliang Chai,Jiayi Wang,Yuyu Luo, Zeping Niu,Guoliang Li

IEEE transactions on knowledge and data engineering 2022

被引用14

Triton Join: Efficiently Scaling to a Large Join State on GPUs with Fast Interconnects

Clemens Lutz,Sebastian Bress,Steffen Zeuch,Tilmann Rabl,Volker Markl

Proceedings of the 2022 International Conference on Management of Data 2022

被引用34

P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAs

Hongjing Huang, Yingtao Li,Jie Sun,Xueying Zhu,Jie Zhang,Liang Luo,Jialin Li,Zeke Wang

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2023

被引用0

Sailfish: Exploring Heterogeneous Query Acceleration on Discrete CPU-FPGA Architecture

Xing Wei,Yaofeng Tu,Yinjun Han,Zhenghua Chen,Xuecheng Qi, Daojun Hua

2023 IEEE 39TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS, ICDEW 2023

被引用0

Data Management in Machine Learning Systems

Matthias Boehm,Arun Kumar,Jun Yang

Synthesis lectures on data management 2019

被引用41

P4SGD: Programmable Switch Enhanced Model-Parallel Training on Generalized Linear Models on Distributed FPGAs

Hongjing Huang, Y. Li,Jun Sun, Xiaobo Zhu,Jie Zhang, Lei Luo,Jialin Li,Zeke Wang

arXiv (Cornell University) 2023

被引用0

Accelerating String-Key Learned Index Structures Via Memoization-based Incremental Training

Minsu Kim,Jinwoo Hwang,Guseul Heo, Seiyeon Cho,Divya Mahajan,Jongse Park

PROCEEDINGS OF THE VLDB ENDOWMENT 2024

被引用3

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

Summary is being generated by the instructions you defined