Peningkatan Hasil Klasifikasi pada Algoritma Random Forest untuk Deteksi Pasien Penderita Diabetes Menggunakan Metode Normalisasi

Improved Classification Results in the Random Forest Algorithm for Detection of Diabetes Patients Using the Normalization Method

  • Gde Agung Brahmana Suryanegara Universitas Telkom
  • Adiwijaya Universitas Telkom
  • Mahendra Dwifebri Purbolaksono Universitas Telkom
Keywords: diabetes, classification, min-max normalization, z-score normalization, random forest.

Abstract

Diabetes is a disease caused by high blood sugar in the body or beyond normal limits. Diabetics in Indonesia have experienced a significant increase, Basic Health Research states that diabetics in Indonesia were 6.9% to 8.5% increased from 2013 to 2018 with an estimated number of sufferers more than 16 million people. Therefore, it is necessary to have a technology that can detect diabetes with good performance, accurate level of analysis, so that diabetes can be treated early to reduce the number of sufferers, disabilities, and deaths. The different scale values for each attribute in Gula Karya Medika’s data can complicate the classification process, for this reason the researcher uses two data normalization methods, namely min-max normalization, z-score normalization, and a method without data normalization with Random Forest (RF) as a classification method. Random Forest (RF) as a classification method has been tested in several previous studies. Moreover, this method is able to produce good performance with high accuracy. Based on the research results, the best accuracy is model 1 (Min-max normalization-RF) of 95.45%, followed by model 2 (Z-score normalization-RF) of 95%, and model 3 (without data normalization-RF) of 92%. From these results, it can be concluded that model 1 (Min-max normalization-RF) is better than the other two data normalization models and is able to increase the performance of classification Random Forest by 95.45%.

 

Downloads

Download data is not yet available.

References

Kementerian Kesehatan Republik Indonesia, 2018. Cegah, Cegah, dan Cegah: Suara Dunia Perangi Diabetes. [Online] (Update 13 Dec 2018). Tersedia di: http://p2ptm.kemkes.go.id/kegiatan-p2ptm/pusat-/cegah-cegah-dan-cegah-suara-dunia-perangi-diabetes [Accessed 6 Juni 2020]

Manimaran, R. and Vanitha, Dr. M, 2017. Novel Approach to Prediction of Diabetes using Classification Mining Algorithm. International Journal of Innovative Research in Science, Engineering and Technology, 6 (7), pp. 14481–14487. doi: 10.15680/IJIRSET.2017.0607266.

Agatsa, D. A., Rismala, R., and Wisesty, U.N, 2020. Klasifikasi Pasien Pengidap Diabetes menggunakan Metode Support Vector Machine. Journal of Telkom University, pp. 1–9.

Indrayanti, Sugianti, D., and AL Karomi, M. A., 2017. Optimasi Parameter K pada Algoritma K-Nearest Neighbour untuk Klasifikasi Penyakit Diabetes Mellitus. Jurnal Neliti, 14 (4), pp. 823–829.

Putra, J. A. and Akbar, A. L., 2016. Klasifikasi Pengidap Diabetes Pada Perempuan Menggunakan Penggabungan Metode Support Vector Machine dan K-Nearest Neighbour. Informatics J. UNEJ, 1 (2), pp. 47–52.

Ayon, S. I. and Islam, M. M., 2019. Diabetes Prediction: A Deep Learning Approach. International Journal of Information Engineering and Electronic Business, 2, pp. 21–27.

Pandey, A. and Jain, A., 2017. Comparative Analysis of KNN Algorithm using Various Normalization Techniques. I.J. Computer Network and Information Security, 11, pp. 36–42. doi: 10.5815/ijcnis.2017.11.04.

Rahman, M. F., Darmawidjadja, M. I., and Alamsah, D., 2017. Klasifikasi untuk Diagnosa Diabetes Menggunakan Metode Bayesian Regularization Neural Network (RBNN). Journal of Garuda, 11 (1), pp. 36–45.

Chairunisa, R., Adiwijaya, and Astuti, W., 2020. Perbandingan CART dan Random Forest untuk Deteksi Kanker berbasis Klasifikasi Data Microarray. Jurnal RESTI, 4(5), pp. 805–812. doi: https://doi.org/10.29207/resti.v4i5.2083.

Han, J., Kamber, M., and Pei, J., 2011. Data Mining Concepts and Techniques. (3rd ed.). USA: Morgan Kaufmann.

Khoirunnisa, A. and Rohmawati A., A., 2019. Implementing Principal Component Analysis and Multinomial Logit for Cancer Detection based on Microarray Data Classification. In 2019 7th International Conference on Information and Communication Technology (ICoICT), pp. 1–6. doi: 10.1109/ICoICT.2019.8835320.

Suyanto, 2018. Machine Learning Tingkat Dasar dan Lanjut. Bandung: Informatika Bandung.

Breiman, L., 2011. Random Forests. Netherlands: Kluwer Academic Publishers.

Nuklianggraita, T. N., Adiwijaya, and Aditsania, A., 2020. On the Feature Selection of Microarray Data for Cancer Detection based on Random Forest Classifier. Jurnal INFOTEL, 12 (3), pp. 89–96. doi: https://doi.org/10.20895/infotel.v12i3.485.

Benbelkacem, S. and Atmani, B., 2019. Random Forests for Diabetes Diagnosis. 2019 International Conference on Computer and Information Sciences (ICCIS), pp. 1–4. doi: 10.1109/ICCISci.2019.8716405.

VijiyaKumar, K., 2019. Random Forest Algorithm for the Prediction of Diabetes. Proceeding of International Conference on Systems Computation Automation and Networking 2019, pp. 1–5. doi: 10.1109/ICSCAN.2019.8878802.

Polamuri, S., 2017. How The Random Forest Algorithm Works in Machine Learning. [Online] (Update 22 May 2017). Tersedia di: https://dataaspirant.com/2017/05/22/random-forest-algorithm-machine-learing/ .

Agusta, Z. P. and Adiwijaya., 2019. Modified balanced random forest for improving imbalanced data prediction. International Journal of Advances in Intelligent Informatics, 5 (1), pp. 58–65.

Singh, D. and Singh, B., 2019. Investigating the impact of data normalization on classification performance. Applied Soft Computing Journal, pp. 1568–4946. doi: https://doi.org/10.1016/j.asoc.2019.105524.

Published
2021-02-20
How to Cite
Gde Agung Brahmana Suryanegara, Adiwijaya, & Mahendra Dwifebri Purbolaksono. (2021). Peningkatan Hasil Klasifikasi pada Algoritma Random Forest untuk Deteksi Pasien Penderita Diabetes Menggunakan Metode Normalisasi. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(1), 114 - 122. https://doi.org/10.29207/resti.v5i1.2880
Section
Information Technology Articles