Improving Diabetes Prediction Accuracy in Indonesia: A Comparative Analysis of SVM, Logistic Regression, and Naive Bayes with SMOTE and ADASYN

  • Selly Rahmawati Universitas Budi Luhur
  • Arief Wibowo Universitas Budi Luhur
  • Anis Fitri Nur Masruriyah Universitas Buana Perjuangan Karawang
Keywords: Adaptive Synthetic Sampling, Diabetes Mellitus, Logistic Regression, Naïve Bayes, Synthetic Minority Over-sampling Technique, Support Vector Machine

Abstract

This study aims to enhance the accuracy of diabetes prediction models in Indonesia by comparing the performance of Support Vector Machines (SVM), Logistic Regression, and Naïve Bayes algorithms, both with and without synthetic oversampling techniques such as SMOTE and ADASYN. The research addresses the issue of imbalanced datasets in medical diagnostics, specifically in predicting diabetes among Indonesian patients, where such imbalance often leads to biased predictions. A comprehensive dataset comprising 657 patient records from a Regional General Hospital in Indonesia was used, with 70% of the data allocated for training and 30% for testing. The results indicate that the SVM model combined with SMOTE achieved the highest accuracy of 95.8% and an AUC of 99.1, underscoring the effectiveness of these techniques in improving prediction performance. The findings of this study highlight the importance of selecting appropriate oversampling methods and algorithms to optimize diabetes prediction accuracy in the Indonesian context, providing valuable insights for future healthcare strategies.

Downloads

Download data is not yet available.

References

World Health Organization, “Diabetes.” Accessed: Mar. 24, 2024. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/diabetes

R. Walker, Take Control of Your Diabetes. 2020.

Kementrian Kesehatan Republik Indonesia, “Diabetes.” Accessed: Mar. 24, 2024. [Online]. Available: https://p2ptm.kemkes.go.id/informasi-p2ptm/penyakit-diabetes-melitus

L. Poretsky, “Diabetes Management in Hospitalized Patients,” 2023. doi: https://doi.org/10.1007/978-3-031-44648-1.

D. M. S. Rao and D. Sai. Sridhathri, “Diabetes Mellitus Prediction Using Ensemble Machine Learning Techniques,” ITM Web of Conferences, vol. 56, 2023, doi: 10.1051/itmconf/20235605015.

Y. Granillo and G. H. Goldsztein, “Machine Learning as a Tool to the Diagnosis of Diabetes,” Journal of Student Research, vol. 11, no. 1, 2022, doi: 10.47611/jsrhs.v11i1.2513.

M. Ranjit Reddy, P. Lakshmi Sagar, and N. S. Shaik, “Diabetes Mellitius Detection and Self Management based on Machine Learning,” J Pharm Negat Results, vol. 13, no. 4, 2022, doi: 10.47750/pnr.2022.13.04.138.

K. R. Tan et al., “Evaluation of Machine Learning Methods Developed for Prediction of Diabetes Complications: A Systematic Review,” J Diabetes Sci Technol, vol. 17, no. 2, 2023, doi: 10.1177/19322968211056917.

K. M. Kaka-Khan, H. Mahmud, and A. A. Ali, “Rough Set-Based Feature Selection for Predicting Diabetes Using Logistic Regression with Stochastic Gradient Decent Algorithm,” UHD Journal of Science and Technology, vol. 6, no. 2, 2022, doi: 10.21928/uhdjst.v6n2y2022.pp85-93.

H. Apriyani and K. Kurniati, “Perbandingan Metode Naïve Bayes Dan Support Vector Machine Dalam Klasifikasi Penyakit Diabetes Melitus,” Journal of Information Technology Ampera, vol. 1, no. 3, 2020, doi: 10.51519/journalita.volume1.isssue3.year2020.page133-143.

A. F. N. Masruriyah, C. E. Sukmawati, and B. A. Dermawan, “Memahami Data Mining dengan Python: Implementasi Praktis.”

R. Raja, K. K. Nagwanshi, S. Kumar, and K. R. Laxmi, Data Mining and Machine Learning Applications. 2022.

M. J. Zaki and M. Wagner Jr, Data Mining and Machine Learning Fundamental Concepts and Algorithms. 2020.

X.-S. Yang, Introduction to Algorithms for Data Mining and Machine Learning. 2019.

J. N.P. and R. Aruna, “Big data analytics in health care by data mining and classification techniques,” ICT Express, no. xxxx, 2021, doi: 10.1016/j.icte.2021.07.001.

H. Hairani, K. E. Saputro, and S. Fadli, “K-means-SMOTE for handling class imbalance in the classification of diabetes with C4.5, SVM, and naive Bayes,” Jurnal Teknologi dan Sistem Komputer, vol. 8, no. 2, 2020, doi: 10.14710/jtsiskom.8.2.2020.89-93.

A. F. N. Masruriyah, H. Y. Novita, C. E. Sukmawati, A. Fauzi, D. Wahiddin, and H. H. Handayani, “Thorough Evaluation of the Effectiveness of SMOTE and ADASYN Oversampling Methods in Enhancing Supervised Learning Performance for Imbalanced Heart Disease Datasets,” in International Conference on Informatics and Computing (ICIC), Institute of Electrical and Electronics Engineers, 2023.

N. G. Ramadhan, “Comparative Analysis of ADASYN-SVM and SMOTE-SVM Methods on the Detection of Type 2 Diabetes Mellitus,” Scientific Journal of Informatics, vol. 8, no. 2, 2021, doi: 10.15294/sji.v8i2.32484.

D. A. Pisner and D. M. Schnyer, Support vector machine. Elsevier Inc., 2020. doi: 10.1016/B978-0-12-815739-8.00006-7.

H. Hikmayanti, A. F. Nurmasruriyah, A. Fauzi, N. Nurjanah, and A. Nur Rani, “Performance Comparison of Support Vector Machine Algorithm and Logistic Regression Algorithm,” International Journal of Artificial Intelegence Research, vol. 7, no. 1, p. 1, 2023, doi: 10.29099/ijair.v7i1.1.1114.

A. F. N. Masruriyah, H. Y. Novita, and C. E. Sukmawati, “Performance Evaluation of Popular Supervised Learning Algorithms Towards Cardiovascular Disease,” vol. 8, no. 3, pp. 420–426, 2023, doi: 10.32493/informatika.v8i3.34103.

S. Hameetha Begum and S. N. Nisha Rani, “Model Evaluation of Various Supervised Machine Learning Algorithm for Heart Disease Prediction,” in Proceedings - 2021 International Conference on Software Engineering and Computer Systems and 4th International Conference on Computational Science and Information Management, ICSECS-ICOCSIM 2021, 2021. doi: 10.1109/ICSECS52883.2021.00029.

Published
2024-10-06
How to Cite
Rahmawati, S., Wibowo, A., & Masruriyah, A. F. N. (2024). Improving Diabetes Prediction Accuracy in Indonesia: A Comparative Analysis of SVM, Logistic Regression, and Naive Bayes with SMOTE and ADASYN. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 8(5), 607 - 614. https://doi.org/10.29207/resti.v8i5.5980
Section
Information Technology Articles