Machine Learning Models for Air Pollution Health Risk Assessment

Lipatova A.V; Potapchenko T.D

doi:10.29207/joseit.v4i1.6544

Lipatova A.V Moscow Technical University of Communications and Informatics, Moscow, Russian Federation
Potapchenko T.D Moscow Technical University of Communications and Informatics, Moscow, Russian Federation

DOI: https://doi.org/10.29207/joseit.v4i1.6544

Keywords: air pollution classification, public health risk assessment, machine learning, ensemble models, environmental monitoring

Abstract

This study explored the application of machine learning (ML) models and artificial neural networks (ANNs) in the assessment of public health concerns associated with air pollution. Utilizing a dataset comprising over 12,000 records from India and Nepal, encompassing both quantitative measurements and visual data, several classification models have been constructed and evaluated to predict air quality index (AQI) categories indicative of varying health risk levels. The implemented models comprise a decision tree (DT), support vector machine (SVM), random forest (RF), XGBoost, and deep neural networks (both convolutional and recurrent). The methodology entailed data preprocessing, feature significance analysis, and model assessment using accuracy metrics and ROC curves. The findings revealed a high classification accuracy across all models (>90%), with ensemble-based methods demonstrating enhanced performance. XGBoost attained superior accuracy with optimal resource efficiency; however, artificial neural network (ANN) models, particularly long short-term memory (LSTM), obtained accuracy levels of 98% by the 15th training epoch. Feature significance analysis revealed that AQI, PM2.5, and PM10 were the primary predictors of health risk categorization. Correlation analysis demonstrated robust associations between particulate matter measures (PM2.5, PM10), underscoring their significance in air quality evaluation. This study proposes a methodological framework for automating risk assessment procedures using machine learning approaches to facilitate more effective environmental health monitoring. The findings suggest that ensemble models offer an optimal balance between precision and computing efficiency for real-time air quality classification systems with potential applications in early warning systems and public health intervention techniques.

Downloads

Download data is not yet available.

References

K. Roell et al., “Development of the InTelligence And Machine LEarning (TAME) Toolkit for Introductory Data Science, Chemical-Biological Analyses, Predictive Modeling, and Database Mining for Environmental Health Research,” Frontiers in Toxicology, vol. 4, Jun. 2022, doi: 10.3389/ftox.2022.893924.

S. Mistry, N. O. Riches, R. Gouripeddi, and J. C. Facelli, “Environmental exposures in machine learning and data mining approaches to diabetes etiology: A scoping review,” Artif Intell Med, vol. 135, p. 102461, Jan. 2023, doi: 10.1016/j.artmed.2022.102461.

R. An, J. Shen, and Y. Xiao, “Applications of Artificial Intelligence to Obesity Research: Scoping Review of Methodologies,” J Med Internet Res, vol. 24, no. 12, p. e40589, Dec. 2022, doi: 10.2196/40589.

S. Cui et al., “Advances and applications of machine learning and deep learning in environmental ecology and health,” Environmental Pollution, vol. 335, p. 122358, Oct. 2023, doi: 10.1016/j.envpol.2023.122358.

R. C. Bernardes, L. L. Botina, F. P. da Silva, K. M. Fernandes, M. A. P. Lima, and G. F. Martins, “Toxicological assessment of agrochemicals on bees using machine learning tools,” J Hazard Mater, vol. 424, p. 127344, Feb. 2022, doi: 10.1016/j.jhazmat.2021.127344.

J. B. Neris, D. M. M. Olivares, F. G. Velasco, F. H. M. Luzardo, L. O. Correia, and L. N. González, “HHRISK: A code for assessment of human health risk due to environmental chemical pollution,” Ecotoxicol Environ Saf, vol. 170, pp. 538–547, Apr. 2019, doi: 10.1016/j.ecoenv.2018.12.017.

P. Gao, G. Huang, L. Zhao, and S. Ma, “Identification of biological indicators for human exposure toxicology in smart cities based on public health data and deep learning,” Front Public Health, vol. 12, May 2024, doi: 10.3389/fpubh.2024.1361901.

E. A. Cohen Hubal et al., “Advancing Exposure Characterization for Chemical Evaluation and Risk Assessment,” Journal of Toxicology and Environmental Health, Part B, vol. 13, no. 2–4, pp. 299–313, Jun. 2010, doi: 10.1080/10937404.2010.483947.

H. Maleki, A. Sorooshian, G. Goudarzi, Z. Baboli, Y. Tahmasebi Birgani, and M. Rahmati, “Air pollution prediction by using an artificial neural network model,” Clean Technol Environ Policy, vol. 21, no. 6, pp. 1341–1352, Aug. 2019, doi: 10.1007/s10098-019-01709-w.

G. Polezer et al., “Assessing the impact of PM2.5 on respiratory disease using artificial neural networks,” Environmental Pollution, vol. 235, pp. 394–403, Apr. 2018, doi: 10.1016/j.envpol.2017.12.111.

N. Temirbekov, M. Temirbekova, D. Tamabay, S. Kasenov, S. Askarov, and Z. Tukenova, “Assessment of the Negative Impact of Urban Air Pollution on Population Health Using Machine Learning Method,” Int J Environ Res Public Health, vol. 20, no. 18, p. 6770, Sep. 2023, doi: 10.3390/ijerph20186770.

M. Tao, L. Wang, L. Chen, Z. Wang, and J. Tao, “Reversal of Aerosol Properties in Eastern China with Rapid Decline of Anthropogenic Emissions,” Remote Sens (Basel), vol. 12, no. 3, p. 523, Feb. 2020, doi: 10.3390/rs12030523.

X. Cheng, W. Zhang, A. Wenzel, and J. Chen, “Stacked ResNet-LSTM and CORAL model for multi-site air quality prediction,” Neural Comput Appl, vol. 34, no. 16, pp. 13849–13866, Aug. 2022, doi: 10.1007/s00521-022-07175-8.

F. Melnikov, J. Kostal, A. Voutchkova-Kostal, J. B. Zimmerman, and P. T. Anastas, “Assessment of predictive models for estimating the acute aquatic toxicity of organic chemicals,” Green Chemistry, vol. 18, no. 16, pp. 4432–4445, 2016, doi: 10.1039/C6GC00720A.

K. Chen, Q. Liu, T. Yang, Q. Ju, and M. Zhu, “Risk assessment of nitrate groundwater contamination using GIS-based machine learning methods: A case study in the northern Anhui plain, China,” J Contam Hydrol, vol. 261, p. 104300, Feb. 2024, doi: 10.1016/j.jconhyd.2024.104300.

A. Fernández, V. López, M. Galar, M. J. del Jesus, and F. Herrera, “Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches,” Knowl Based Syst, vol. 42, pp. 97–110, Apr. 2013, doi: 10.1016/j.knosys.2013.01.018.

S. Wang, H. Lu, A. Khan, F. Hajati, M. Khushi, and S. Uddin, “A machine learning software tool for multiclass classification[Formula presented],” Software Impacts, vol. 13, 2022, doi: 10.1016/j.simpa.2022.100383.

T. Berliani, E. Rahardja, and L. Septiana, “Perbandingan Kemampuan Klasifikasi Citra X-ray Paru-paru menggunakan Transfer Learning ResNet-50 dan VGG-16,” Journal of Medicine and Health, vol. 5, no. 2, 2023, doi: 10.28932/jmh.v5i2.6116.

Indriani, “Applying Transfer Learning ResNet-50 for Tracking and Classification of A Coral Reef in Development The Mobile Application with Scrum Framework,” Journal of Information Technology, vol. 4, no. 2, 2023, doi: 10.47292/joint.v4i2.90.

K. Ghosh, C. Bellinger, R. Corizzo, P. Branco, B. Krawczyk, and N. Japkowicz, “The class imbalance problem in deep learning,” Mach Learn, vol. 113, no. 7, 2024, doi: 10.1007/s10994-022-06268-8.

A. Ali, S. M. Shamsuddin, and A. L. Ralescu, “Classification with class imbalance problem: A review,” International Journal of Advances in Soft Computing and its Applications, vol. 7, no. 3, 2015.

S. K. Guttikunda, K. A. Nishadh, and P. Jawahar, “Air pollution knowledge assessments (APnA) for 20 Indian cities,” Urban Clim, vol. 27, 2019, doi: 10.1016/j.uclim.2018.11.005.