The Effect of Oversampling on the Classification of Hypertension with the Naïve Bayes Algorithm, Decision Tree, and Artificial Neural Network (ANN)
Pengaruh Oversampling pada Klasifikasi Hipertensi dengan Algoritma Naïve Bayes, Decision Tree, dan Artificial Neural Network (ANN)
Abstract
Oversampling is a technique to balance the number of data records for each class by generating data with a small number of records in a class, so that the amount is balanced with data with a class with a large number of records. Oversampling in this study is applied to hypertension dataset where hypertensive class has a small number of records when compared to the number of records for non-hypertensive classes. This study aims to evaluate the effect of oversampling on the classification of hypertension dataset consisting of hypertensive and non-hypertensive classes by utilizing the Naïve Bayes, Decision Tree, and Artificial Neural Network (ANN) as well as finding the best model of the three algorithms. Evaluation of the use of oversampling on hypertension dataset is done by processing the data by imputing missing values, oversampling, and transforming data into the same range, then using the Naïve Bayes, Decision Tree, and ANN to build classification models. By dividing 80% of data as training data to build models and 20% as validation data for testing models, we had an increase in classification performance in the form of accuracy, precision, and recall of the oversampled data when compared without oversampling. The best performance in this study resulted in the highest accuracy using ANN with 0.91, precision 0.86 and recall 0.99.
Downloads
References
K. K. R. Indonesia, “Hipertensi Penyakit Paling Banyak Diidap Masyarakat,” 2019. [Online]. Available: https://www.kemkes.go.id/article/view/19051700002/hipertensi-penyakit-paling-banyak-diidap-masyarakat.html. [Accessed: 04-Jun-2020].
M. F. Rahman, M. Ilham Darmawidjadja, and D. Alamsah, “KLASIFIKASI UNTUK DIAGNOSA DIABETES MENGGUNAKAN METODE BAYESIAN REGULARIZATION NEURAL NETWORK (RBNN),” 2017.
Y. Arum Sari and A. Arwan, “Seleksi Fitur Information Gain untuk Klasifikasi Penyakit Jantung Menggunakan Kombinasi Metode K-Nearest Neighbor dan Naïve Bayes Human Detection and Tracking View project Smart Nutrition Box View project,” 2018.
D. Lafreniere, F. Zulkernine, D. Barber, and K. Martin, “Using machine learning to predict hypertension from a clinical dataset,” in 2016 IEEE Symposium Series on Computational Intelligence, SSCI 2016, 2017.
A. Wang, N. An, G. Chen, L. Li, and G. Alterovitz, “Predicting hypertension without measurement: A non-invasive, questionnaire-based approach,” Expert Syst. Appl., vol. 42, no. 21, pp. 7601–7609, Jun. 2015.
B. O. Afeni, T. I. Aruleba, and I. A. Oloyede, “Hypertension Prediction System Using Naive Bayes Classifier,” J. Adv. Math. Comput. Sci., pp. 1–11, Sep. 2017.
M. K. Kanwar et al., “Risk stratification in pulmonary arterial hypertension using Bayesian analysis,” Eur. Respir. J., p. 2000008, May 2020.
M. Tayefi et al., “The application of a decision tree to establish the parameters associated with hypertension,” Comput. Methods Programs Biomed., vol. 139, pp. 83–91, Feb. 2017.
I. A.-A. J. of M. and Computer and undefined 2017, “Predictive Model for the Classification of Hypertension Risk Using Decision Trees Algorithm,” academia.edu.
O. M. Olaitan and H. L. Viktor, “SCUT-DS: Learning from multi-class imbalanced canadian weather data,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, vol. 11177 LNAI, pp. 291–301.
M. Khaldy, & C. K.-I. R., and undefined 2018, “Resampling imbalanced class and the effectiveness of feature selection methods for heart failure dataset,” pdfs.semanticscholar.org.
O. Heranova, “Synthetic Minority Oversampling Technique pada Averaged One Dependence Estimators untuk Klasifikasi Credit Scoring,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 3, no. 3, pp. 443–450, Dec. 2019.
S. Maheshwari, J. Agrawal, and S. Sharma, “A New approach for Classification of Highly Imbalanced Datasets using Evolutionary Algorithms,” Int. J. Sci. Eng. Res., vol. 2, no. 7, 2011.
A. Gosain and S. Sardana, “Handling class imbalance problem using oversampling techniques: A review,” in 2017 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2017, 2017, vol. 2017-January, pp. 79–85.
Y. E. Kurniawati, A. E. Permanasari, and S. Fauziati, “Adaptive Synthetic-Nominal (ADASYN-N) and Adaptive Synthetic-KNN (ADASYN-KNN) for Multiclass Imbalance Learning on Laboratory Test Data,” in Proceedings - 2018 4th International Conference on Science and Technology, ICST 2018, 2018.
A. Aditsania, Adiwijaya, and A. L. Saonard, “Handling imbalanced data in churn prediction using ADASYN and backpropagation algorithm,” in Proceeding - 2017 3rd International Conference on Science in Information Technology: Theory and Application of IT for Education, Industry and Society in Big Data Era, ICSITech 2017, 2017, vol. 2018-January, pp. 533–536.
J. L. P. Lima, D. MacEdo, and C. Zanchettin, “Heartbeat Anomaly Detection using Adversarial Oversampling,” in Proceedings of the International Joint Conference on Neural Networks, 2019, vol. 2019-July.
R. K. Sari and L. PH, “FAKTOR- FAKTOR YANG MEMPENGARUHI HIPERTENSI,” J. Ilm. Permas J. Ilm. STIKES Kendal, vol. 6, no. 1, pp. 1–10, 2016.
S. Zhang, “Nearest neighbor selection for iteratively kNN imputation,” J. Syst. Softw., vol. 85, no. 11, pp. 2541–2552, Nov. 2012.
H. He, H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” IEEE Int. Jt. Conf. NEURAL NETWORKS (IEEE WORLD Congr. Comput. Intell. IJCNN 2008, pp. 1322--1328, 2008.
J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. 2012.
M. M. Saritas and A. Yasar, “Performance Analysis of ANN and Naive Bayes Classification Algorithm for Data Classification,” Int. J. Intell. Syst. Appl. Eng., vol. 7, no. 2, pp. 88–91, Jun. 2019.
Copyright (c) 2020 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;