The Effect of Resampling Techniques on Model Performance Classification of Maternal Health Risks

Nia Mauliza; Aisha Shakila Iedwan; Yoga Pristyanto; Anggit Dwi Hartanto; Arif Nur Rohman

doi:10.29207/resti.v8i4.5934

Nia Mauliza Universitas Amikom Yogyakarta
Aisha Shakila Iedwan Universitas Amikom Yogyakarta
Yoga Pristyanto Universitas Amikom Yogyakarta
Anggit Dwi Hartanto Universitas Amikom Yogyakarta
Arif Nur Rohman Universitas Amikom Yogyakarta

DOI: https://doi.org/10.29207/resti.v8i4.5934

Keywords: class imbalance, resampling methods, classification algorithms, maternal health, prediction accuracy, machine learning

Abstract

Indonesia's maternal mortality rate was the second highest in ASEAN, reflecting the problem of class imbalance in maternal health data. This research aimed to improve prediction accuracy in the classification of pregnant women's diseases through the application of various resampling methods. The methods used in this research included Synthetic Minority Over-sampling Technique (SMOTE), SMOTE-Edited Nearest Neighbor (SMOTE-ENN), Adaptive Synthetic Sampling (ADASYN), and ADASYN-ENN, using five classification algorithms: Decision Tree, K-Nearest Neighbor (KNN), Naïve Bayes, Random Forest, and Support Vector Machine (SVM). Performance evaluation was carried out using accuracy, precision, recall, and F1-score metrics to determine the best method and algorithm. The results showed that the SMOTE-ENN and ADASYN-ENN methods significantly improved the model's performance in predicting maternal disease. Random Forest and Decision Tree algorithms showed the best results in terms of accuracy and consistency. These findings provided practical guidance for the application of resampling techniques in the classification of pregnant women's health data, which could contribute to improving the quality of maternal health services in Indonesia.

Downloads

Download data is not yet available.

References

Rokom, "For Mother and Baby to be Safe," Healthy My Country, Jan. 25, 2024. Accessed: Jul. 17, 2024. [Online]. Available: https://sehatnegeriku.kemkes.go.id/baca/blog/20240125/3944849/agar-ibu-dan-bayi-selamat/

“View of Applications and Benefits of Machine Learning in Hospitals.” https://www.jurnal.medanresourcecenter.org/index.php/MULTIVERSE/article/view/1207/1089

Agnes, "The Benefits of Machine Learning in the Healthcare Industry," DQLab | Indonesian R Python Online Data Science Course, Jul. 21, 2022. Accessed: Jul. 17, 2024. [Online]. Available: https://dqlab.id/manfaat-machine-learning-di-industri-healthcare

“View of Comparative Analysis of Machine Learning Algorithms for Classifying Risk Levels of Pregnant Women.” https://journal-stiayappimakassar.ac.id/index.php/srj/article/view/846/867

H. B. Mutlu, F. Durmaz, N. Yücel, E. Cengi̇L, and M. Yildirim, “Prediction of Maternal Health Risk with Traditional Machine Learning Methods,” NATURENGS MTU Journal of Engineering and Natural Sciences Malatya Turgut Ozal University, Jun. 2023, doi: 10.46572/naturengs.1293185.

“An Optimized View of Unbalanced Data on Drug Target Interactions with Sampling and Ensemble Support Vector Machine.” https://jtiik.ub.ac.id/index.php/jtiik/article/view/2857/pdf

“SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering,” Information Sciences, vol. 291, pp. 184–203, doi: 10.1016/j.ins.2014.08.051

A. M. W. Saputra and A. W. Wijayanto, “IMPLEMENTATION OF ENSEMBLE TECHNIQUES FOR DIARRHEA CASES CLASSIFICATION OF UNDER-FIVE CHILDREN IN INDONESIA,” JITK (Journal of Computer Science and Technology), vol. 6, no. 2, pp. 175–180, Feb. 2021, doi: 10.33480/jitk.v6i2.1935.

C. Kaope and Y. Pristyanto, “The Effect of Class Imbalance Handling on Datasets Toward Classification Algorithm Performance,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 22, no. 2, pp. 227–238, Mar. 2023, doi: 10.30812/matrik.v22i2.2515.

M. Fatourechi, R. K. Ward, S. G. Mason, J. Huggins, A. Schlögl, and G. E. Birch, “Comparison of Evaluation Metrics in Classification Applications with Imbalanced Datasets,” in 2008 Seventh International Conference on Machine Learning and Applications, 2008. Accessed: Jul. 17, 2024. [Online]. Available: http://dx.doi.org/10.1109/icmla.2008.34

I. Lin, O. Loyola-González, R. Monroy, and M. A. Medina-Pérez, “A Review of Fuzzy and Pattern-Based Approaches for Class Imbalance Problems,” Applied Sciences, vol. 11, no. 14, p. 6310, Jul. 2021, doi: 10.3390/app11146310.

C. Kaope and Y. Pristyanto, “The Effect of Class Imbalance Handling on Datasets Toward Classification Algorithm Performance,” MATRIK: Journal of Management, Information Engineering and Computer Engineering, vol. 22, no. 2, pp. 227–238, Mar. 2023, doi: 10.30812/matrik.v22i2.2515.

Y. A. Sir and A. H. H. Soepranoto, "Data Resampling Approach to Handling Class Imbalance Problems," Journal of Computers and Informatics, vol. 10, no. 1, pp. 31–38, Mar. 2022, doi: 10.35508/jicon.v10i1.6554.

C. Sonjaya, A. F. Nur Masruriyah, D. Sulistya Kusumaningrum, and A. Rizky Pratama, "The Performance Comparison of Classification Algorithms in Order to Detecting Heart Disease," INTERNAL (Information System Journal), vol. 5, no. 2, pp. 166–175, Dec. 2022, doi: 10.32627/internal.v5i2.595

A. M. W. Saputra and A. W. Wijayanto, “IMPLEMENTATION OF ENSEMBLE TECHNIQUES FOR DIARRHEA CASES CLASSIFICATION OF UNDER-FIVE CHILDREN IN INDONESIA,” JITK (Journal of Computer Science and Technology), vol. 6, no. 2, pp. 175–180, Feb. 2021, doi: 10.33480/jitk.v6i2.1935.

S. Maula Chamzah, M. Lestandy, N. Kasan, and A. Nugraha, "Application of Synthetic Minority Oversampling Technique (SMOTE) to Imbalance Class on Text Data Using kNN," Syntax: Journal of Informatics, vol. 11, no. 02, pp. 56–67, Nov. 2022, doi: 10.35706/syji.v11i02.6940.

W. I. Sabilla and C. Bella Vista, "Implementation of SMOTE and Under Sampling on Imbalanced Dataset for Corporate Bankruptcy Prediction," Journal of Applied Computing, vol. 7, no. 2, pp. 329–339, Dec. 2021, doi: 10.35143/jkt.v7i2.5027.

A. Indrawati, H. Subagyo, A. Sihombing, W. Wagiyah, and S. Afandi, “ANALYZING THE IMPACT OF RESAMPLING METHOD FOR IMBALANCED DATA TEXT IN INDONESIAN SCIENTIFIC ARTICLES CATEGORIZATION,” READ: DOCUMENTATION AND INFORMATION JOURNAL, vol. 41, no. 2, p. 133, Dec. 2020, doi: 10.14203/j.baca.v41i2.702

M. Muntasir Nishat et al., “A Comprehensive Investigation of the Performances of Different Machine Learning Classifiers with SMOTE-ENN Oversampling Technique and Hyperparameter Optimization for Imbalanced Heart Failure Dataset,” Scientific Programming, vol. 2022, pp. 1–17, Mar. 2022, doi: 10.1155/2022/3649406

Haibo He, Yang Bai, E. A. Garcia, and Shutao Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” in 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Jun. 2008. Accessed: Jul. 17, 2024. [Online]. Available: http://dx.doi.org/10.1109/ijcnn.2008.4633969

H. Kamal and M. Mashaly, “RETRACTED: Enhancing Multi-Class Intrusion Detection through Hybrid Auto Encoder-Deep Neural Network Classifiers: A Comprehensive Analysis of Class Imbalance Mitigation Strategies Using Data Resampling Techniques,” Research Square Platform LLC, May 2024. Accessed: Jul. 17, 2024. [Online]. Available: http://dx.doi.org/10.21203/rs.3.rs-4438556/v1

A. P. Wibawa, M. G. A. Purnama, M. F. Akbar, and F. A. Dwiyanto, “Classification Methods,” in Proceedings of the Computer Science and Information Technology Seminar, Mar. 2018, Vol. 13.

R. Rusito and M. Firmansyah, "IMPLEMENTATION OF THE DECISION TREE METHOD AND C4.5 ALGORITHM FOR CLASSIFICATION OF BANK CUSTOMER DATA," Infokam Scientific Journal, vol. 12, no. 2, Oct. 2016, doi: 10.53845/infokam.v12i2.103.

Baiq Nurul Azmi, Arief Hermawan, and Donny Avianto, "Analysis of the Influence of the Composition of Training Data and Testing Data on the Use of PCA and Decision Tree Algorithms for Classifying Liver Disease Patients," JTIM: Journal of Information Technology and Multimedia, vol. 4, no. 4, pp. 281–290, Feb. 2023, doi: 10.35746/jtim.v4i4.298.

B. Charbuty and A. Abdulazeez, “Classification Based on Decision Tree Algorithm for Machine Learning,” Journal of Applied Science and Technology Trends, vol. 2, no. 01, pp. 20–28, Mar. 2021, doi: 10.38094/jastt20165.

D. Sebastian, "Implementation of the K-Nearest Neighbor Algorithm to Classify Products from several E-marketplaces," Journal of Informatics Engineering and Information Systems, vol. 5, no. 1, May 2019, doi: 10.28932/jutisi.v5i1.1581.

S. K. P. Loka and A. Marsal, "Comparison of the K-Nearest Neighbor Algorithm and Naïve Bayes Classifier for Classifying Nutritional Status in Toddlers," MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 3, no. 1, pp. 8–14, May 2023, doi: 10.57152/malcom.v3i1.474.

J. Huang, J. Lu, and C. X. Ling, “Comparing naive Bayes, decision trees, and SVM with AUC and accuracy,” in Third IEEE International Conference on Data Mining. Accessed: Jul. 17, 2024. [Online]. Available: http://dx.doi.org/10.1109/icdm.2003.1250975

Gde Agung Brahmana Suryanegara, Adiwijaya, and Mahendra Dwifebri Purbolaksono, "Improving Classification Results in the Random Forest Algorithm for Detecting Diabetic Patients Using the Normalization Method," RESTI Journal (Information Systems and Technology Engineering), vol. 5, no. 1, pp. 114–122, Feb. 2021, doi: 10.29207/resti.v5i1.2880.

A. Arisusanto, N. Suarna, and G. Dwilestari, "Classification Analysis of Mobile Phone Price Data Using the Random Forest Algorithm with Optimize Grid Parameters," Journal of Computer Science Technology, vol. 1, no. 2, pp. 43–47, Jul. 2023, doi: 10.56854/jtik.v1i2.51.

Jan Melvin Ayu Soraya Dachi and Pardomuan Sitompul, "Comparative Analysis of the XGBoost Algorithm and the Random Forest Ensemble Learning Algorithm in Credit Decision Classification," RESEARCH JOURNAL OF MATHEMATICS AND NATURAL SCIENCES, vol. 2, no. 2, pp. 87–103, Jul. 2023, doi: 10.55606/jurrimipa.v2i2.1470.

P. Handayani, Abd. C. Fauzan, and H. Harliana, "Machine Learning Classification of Toddler Nutritional Status Using the Random Forest Algorithm," CLICK: Scientific Study of Informatics and Computers, vol. 4, no. 6, pp. 3064–3072, Jun. 2024, doi: 10.30865/klik.v4i6.1909.

S. Sudianto, A. D. Sripamuji, I. R. Ramadhanti, R. R. Amalia, J. Saputra, and B. Prihatnowo, "Application of Support Vector Machine Algorithms and Multi-Layer Perceptron in News Topic Classification," National Journal of Informatics Engineering Education: JANAPATI, vol. 11, no. 2, pp. 84–91, Aug. 2022.

G. A. Lustiansyah, H. Prasetyo, B. K. Widodo, B. A. Wibisono, and D. S. Prasvita, "Comparative Analysis of SVM and CNN Algorithms for Fruit Classification," Proceedings of the National Student Seminar on Computer Science and its Applications, vol. 2, no. 2, pp. 1–11, Jan. 2021.

E. Ramon, A. Nazir, N. Novriyanto, Y. Yusra, and L. Oktavia, "CLASSIFICATION OF NUTRITIONAL STATUS OF POSYANDU BABIES IN BANGUN PURBA DISTRICT USING THE SUPPORT VECTOR MACHINE (SVM) ALGORITHM," Journal of Information Systems and Informatics (Simika), vol. 5, no. 2, pp. 143–150, Aug. 2022, doi: 10.47080/simika.v5i2.2185.

A. Tharwat, “Classification assessment methods,” Applied Computing and Informatics, vol. 17, no. 1, pp. 168–192, Jul. 2020, doi: 10.1016/j.aci.2018.08.003.

M. D. N. Alif and N. F. Fahrudin, “Performance Analysis of Oversampling and Undersampling on Telco Churn Data Using Naive Bayes, SVM And Random Forest Methods,” E3S Web of Conferences, vol. 484, p. 02004, 2024, doi: 10.1051/e3sconf/202448402004.

The Effect of Resampling Techniques on Model Performance Classification of Maternal Health Risks

Abstract

Downloads

References

Most read articles by the same author(s)