DATA ANALYSIS PLATFORM FOR STREAM AND BATCH DATA PROCESSING ON HYBRID COMPUTING RESOURCES
9th International Conference "Distributed Computing and Grid Technologies in Science and Education"(2021)
摘要
The modern Big Data ecosystem provides tools to build a flexible platform for processing data streams and batch datasets. Supporting both the functioning of modern giant particle physics experiments and the services necessary for the work of many individual physics researchers results in generating and transferring large amounts of semi-structured data. Thus, it is promising to apply cutting-edge technologies to study these data flows and make the services' provisioning more effective. In this work, we describe the structure and implementation of our data analysis platform, built on the Apache Spark cluster. With the official support for GPU computing now available in Spark version 3, we propose a change in the architecture to utilize these more performant resources while keeping the platform's functionality provided by using mainstream Big Data software. Furthermore, the necessity for GPU support entails a change in the computing resource management infrastructure from Apache Mesos to Kubernetes. Finally, to demonstrate the features and operation of the system, we use the task of network packet analysis for security monitoring and anomaly detection in both batch and streammodes.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要