Nonlinear Models Over Normalized Data

2019 IEEE 35th International Conference on Data Engineering (ICDE)(2019)

引用 12|浏览57
暂无评分
摘要
Machine Learning (ML) applications are proliferating in the enterprise. Increasingly enterprise data are used to build sophisticated ML models to assist critical business functions. Relational data which are prevalent in enterprise applications are typically normalized; as a result data have to be denormalized via primary/foreign-key joins to be provided as input to ML algorithms. In this paper we study the implementation of popular nonlinear ML models and in particular independent Gaussian Mixture Models (IGMM) over normalized data. For the case of IGMM we propose algorithms taking the statistical properties of the Gaussians into account to construct mixture models, factorizing the computation. In that way we demonstrate that we can conduct the training of the models much faster compared to other applicable approaches, without any loss in accuracy. We present the results of a thorough experimental evaluation, varying several parameters of the input relations involved and demonstrate that our proposals both for the case of IGMM yield drastic performance improvements which become increasingly higher as parameters of the underlying data vary, without any loss in accuracy.
更多
查看译文
关键词
Computational modeling,Data models,Gaussian mixture model,Analytical models,Gaussian distribution,Convergence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要