Multivariate Analysis for Characterization of Air Pollution Sources: Part 1 Prior Data Screening and Underlying Assumptions

Polish Journal of Environmental Studies（2024）

引用 0|浏览0

暂无评分

摘要

There is a real need for comparability and consistency of findings obtained from different multivariate methods, based on different assumptions and sensitivity to data errors. This study aims to investigate essential aspects of data screening prior to analysis, particularly the detection of outliers, communalities, multicollinearity, and Kaiser-Meyer-Olkin (KMO) and Bartlett's tests, and to examine the influence of changing test parameters such as the number of convergence, number of bootstrap runs, FPEAK value, and minimum value of coefficient of determination (R 2 ) on model results. Positive matrix factorization (PMF) and Unmix were applied to monitoring data collected from a receptor site. Findings of communalities estimate and multicollinearity indicated possible data errors in Ca, Cu, Na, and Mn, which affected the stability of source profiles. PMF detected biomass burning, coal combustion, traffic, industrial emissions, Mn-enriched sources, and secondary aerosols, while the Unmix model identified similar sources with comparable profiles, apart from profiles of vehicle exhaust and industrial emissions showing slight differences. Unmix was highly influenced by outliers, multicollinearity, and, to a lesser extent, change in sample size compared to PMF. We recommend interpreting the results of Bootstrapping, rather than basic runs for both PMF and Unmix. We also recommend data screening prior to further modeling. We suggest checking multicollinearity using more than one statistical measure, particularly VIF (Variance Inflation Factor) values together with tolerance values.

查看译文

关键词

Multivariate analysis,modeling,data screening,outliers,Multicollinearity,Bootstrapping

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要