Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine Hybrid Model

Nature Environment and Pollution Technology(2019)

引用 0|浏览1
暂无评分
摘要
Machine learning and data mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a widely used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, air quality classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a new hybrid classification model based on information theory and support vector machine (SVM) using the air quality data of 4 cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from January 1, 2014 to April 30, 2016. China’s Ministry of Environmental Protection has classified the daily air quality into 6 levels, namely, serious pollution, severe pollution, moderate pollution, light pollution, good and excellent based on their respective air quality index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM machine learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), artificial neural network (ANN) and K-nearest neighbours (KNN) models in terms of accuracy as well as complexity.
更多
查看译文
关键词
environment
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要