Max Depth Impact on Heart Disease Classification: Decision Tree and Random Forest

Keywords: classification, data mining, heart disease, decision tree, Random Forest, machine learning

Abstract

Results in heart disease classification that are inaccurate and have low accuracy can endanger the patient's life. Some parameters in the algorithm model also influence classification. This study compares the Decision Tree and Random Forest algorithms for heart disease. The influence of maximum depth on heart disease classification also has significant implications. If the maximum depth is not set correctly, the classification results can be inaccurate and lead to incorrect diagnoses. This study uses five data split schemes, namely 60%: 40%, 70%: 30%, 75%: 25%, 80%: 20%, 90%: 10% and tested with different max depth parameters, namely max depth = 3, 4, 5, 6, and 7. This research produces the best accuracy using the 90%:10% scheme and max depth = 7 with the best accuracy result using the Random Forest algorithm of 99.29% while the Decision Tree algorithm is 98.05%. Then the precision and recall value of the Random Forest algorithm is 99% while the Decision Tree is 98%. The results of computation time using Decision Tree are faster than using Random Forest with a computation time for training data of 0.0075 s, while the testing data are 0.009 s. In future research, research can be conducted on the effect of other parameters by testing using several data sets.

Downloads

Download data is not yet available.

Author Biographies

Arief Hermawan, Universitas Teknologi Yogyakarta

Program Studi Teknologi Informasi Program Magister, Fakultas Pascasarjana, Universitas Teknologi Yogyakarta

Donny Avianto, Universitas Teknologi Yogyakarta

Program Studi Informatika, Fakultas Sains dan Teknologi, Universitas Teknologi Yogyakarta

References

A. W. Nugraha, I. Prasetyo, and Taryudi, “Alat Monitoring Detak Jantung, Kadar Oksigen Dalam Darah Dan Suhu Tubuh Berbasis Internet of Things,” Autocracy: Jurnal Otomasi, Kendali, dan Aplikasi Industri, vol. 7, no. 1, pp. 42–47, Jun. 2021, doi: 10.21009/autocracy.071.7.

N. Lina and D. Saraswati, “Deteksi Dini Penyakit Jantung Koroner di Desa Kalimanggis dan Madiasari Kabupaten Tasikmalaya,” Warta LPM, vol. 23, no. 1, pp. 45–53, Feb. 2020, doi: 10.23917/warta.v23i1.9019.

World Health Organization, “The Top 10 Causes of Death,” World Health Organization. Accessed: Nov. 01, 2023. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death

F. S. Nugraha, M. J. Shidiq, and S. Rahayu, “Analisis Algoritma Klasifikasi Neural Network Untuk Diagnosis Penyakit Kanker Payudara,” Jurnal Pilar Nusa Mandiri, vol. 15, no. 2, pp. 149–156, Aug. 2019, doi: 10.33480/pilar.v15i2.601.

P. D. Putra, S. Sukemi, and D. P. Rini, “Peningkatan Akurasi Klasifikasi Backpropagation Menggunakan Artificial Bee Colony dan K-NN Pada Penyakit Jantung,” Jurnal Media Informatika Budidarma, vol. 5, no. 1, p. 208, Jan. 2021, doi: 10.30865/mib.v5i1.2634.

A. Nurmasani and Y. Pristyanto, “Algoritme Stacking Untuk Klasifikasi Penyakit Jantung Pada Dataset Imbalanced Class,” Pseudocode, vol. 8, no. 1, pp. 21–26, Mar. 2021, doi: 10.33369/pseudocode.8.1.21-26.

D. Pradana, M. Luthfi Alghifari, M. Farhan Juna, and D. Palaguna, “Klasifikasi Penyakit Jantung Menggunakan Metode Artificial Neural Network,” Indonesian Journal of Data and Science, vol. 3, no. 2, pp. 55–60, Jul. 2022, doi: 10.56705/ijodas.v3i2.35.

M. Azhari, Z. Situmorang, and R. Rosnelly, “Perbandingan Akurasi, Recall, dan Presisi Klasifikasi pada Algoritma C4.5, Random Forest, SVM dan Naive Bayes,” Jurnal Media Informatika Budidarma, vol. 5, no. 2, p. 640, Apr. 2021, doi: 10.30865/mib.v5i2.2937.

W. Nugraha and R. Sabaruddin, “Teknik Resampling untuk Mengatasi Ketidakseimbangan Kelas pada Klasifikasi Penyakit Diabetes Menggunakan C4.5, Random Forest, dan SVM,” Techno.Com, vol. 20, no. 3, pp. 352–361, Aug. 2021, doi: 10.33633/tc.v20i3.4762.

T. Shaikhina, D. Lowe, S. Daga, D. Briggs, R. Higgins, and N. Khovanova, “Decision tree and random forest models for outcome prediction in antibody incompatible kidney transplantation,” Biomed Signal Process Control, vol. 52, pp. 456–462, Jul. 2019, doi: 10.1016/j.bspc.2017.01.012.

S. Rahman, M. Hasan, and A. K. Sarkar, “Prediction of Brain Stroke using Machine Learning Algorithms and Deep Neural Network Techniques,” European Journal of Electrical Engineering and Computer Science, vol. 7, no. 1, pp. 23–30, Jan. 2023, doi: 10.24018/ejece.2023.7.1.483.

J. Amalia, N. Yosevin Nababan, K. G. Tambunan, and I. S. Sinaga, “Decision Tree Dengan Binary Bat Algoruthm Optimization Pada Heart Catheterization Prediction,” Hexagon Jurnal Teknik dan Sains, vol. 3, no. 2, pp. 46–51, Jul. 2022, doi: 10.36761/hexagon.v3i2.1640.

Riska Chairunisa, Adiwijaya, and Widi Astuti, “Perbandingan CART dan Random Forest untuk Deteksi Kanker berbasis Klasifikasi Data Microarray,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 4, no. 5, pp. 805–812, Oct. 2020, doi: 10.29207/resti.v4i5.2083.

A. D. S. Vins and W. R. Sam Emmanuel, “Optimized Random Forest Algorithm with Parameter Tuning for Predicting Heart Disease,” 2021, pp. 443–451. doi: 10.1007/978-3-030-81462-5_40.

D. Elavarasan and P. M. D. R. Vincent, “A reinforced random forest model for enhanced crop yield prediction by integrating agrarian parameters,” J Ambient Intell Humaniz Comput, vol. 12, no. 11, pp. 10009–10022, Nov. 2021, doi: 10.1007/s12652-020-02752-y.

S. Lutfiani, T. H. Saragih, F. Abadi, M. R. Faisal, and D. Kartini, “Perbandingan Metode Extreme Gradient Boosting Dan Metode Decision Tree Untuk Klasifikasi Genre Musik,” Jurnal Informatika Polinema, vol. 9, no. 4, pp. 373–382, Aug. 2023, doi: 10.33795/jip.v9i4.1319.

A. D. Patange, S. S. Pardeshi, R. Jegadeeshwaran, A. Zarkar, and K. Verma, “Augmentation of Decision Tree Model Through Hyper-Parameters Tuning for Monitoring of Cutting Tool Faults Based on Vibration Signatures,” Journal of Vibration Engineering & Technologies, Nov. 2022, doi: 10.1007/s42417-022-00781-9.

H. Gamal, A. Alsaihati, and S. Elkatatny, “Predicting the Rock Sonic Logs While Drilling by Random Forest and Decision Tree-Based Algorithms,” J Energy Resour Technol, vol. 144, no. 4, Apr. 2022, doi: 10.1115/1.4051670.

S. Wang, Y. Wang, D. Wang, Y. Yin, Y. Wang, and Y. Jin, “An improved random forest-based rule extraction method for breast cancer diagnosis,” Appl Soft Comput, vol. 86, p. 105941, Jan. 2020, doi: 10.1016/j.asoc.2019.105941.

P. R. Togatorop, M. Sianturi, D. Simamora, and D. Silaen, “Optimizing Random Forest using Genetic Algorithm for Heart Disease Classification,” Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, vol. 13, no. 1, p. 60, Aug. 2022, doi: 10.24843/LKJITI.2022.v13.i01.p06.

R. J. P. Princy, S. Parthasarathy, P. S. Hency Jose, A. Raj Lakshminarayanan, and S. Jeganathan, “Prediction of Cardiac Disease using Supervised Machine Learning Algorithms,” in 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), IEEE, May 2020, pp. 570–575. doi: 10.1109/ICICCS48265.2020.9121169.

A. F. Riany and G. Testiana, “Penerapan Data Mining untuk Klasifikasi Penyakit Jantung Koroner Menggunakan Algoritma Naïve Bayes,” MDP Student Conference, vol. 2, no. 1, pp. 297–305, Apr. 2023, doi: 10.35957/mdp-sc.v2i1.4388.

J. D. Muthohhar and A. Prihanto, “Analisis Perbandingan Algoritma Klasifikasi untuk Penyakit Jantung,” Journal of Informatics and Computer Science (JINACS), pp. 298–304, Jan. 2023, doi: 10.26740/jinacs.v4n03.p298-304.

N. Mohapatra, K. Shreya, and A. Chinmay, “Optimization of the Random Forest Algorithm,” 2020, pp. 201–208. doi: 10.1007/978-981-15-0978-0_19.

S. Zhifang and L. Yi, “Optimization of Decision Tree Machine Learning Strategy in Data Analysis,” J Phys Conf Ser, vol. 1693, no. 1, p. 012219, Dec. 2020, doi: 10.1088/1742-6596/1693/1/012219.

P. Probst, M. N. Wright, and A. Boulesteix, “Hyperparameters and tuning strategies for the random forest,” WIREs Data Mining and Knowledge Discovery, vol. 9, no. 3, May 2019, doi: 10.1002/widm.1301.

I. Werdiningsih et al., “Identifying Credit Card Fraud in Illegal Transactions Using Random Forest and Decision Tree Algorithms,” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 12, no. 3, pp. 477–484, Nov. 2023, doi: 10.32736/sisfokom.v12i3.1730.

B. H. Mawaridi and M. Faisal, “Rekomendasi Merk Mobil Untuk Calon Pembeli Menggunakan Algoritma Decision Tree,” Jurnal Informatika, vol. 10, no. 2, pp. 157–162, Oct. 2023, doi: 10.31294/inf.v10i2.16000.

A. Janosi, W. Steinbrunn, M. Pfisterer, and R. Detrano, “Heart Disease,” UCI Machine Learning Repository. Accessed: Nov. 20, 2023. [Online]. Available: https://archive.ics.uci.edu/dataset/45/heart+disease

M. D. Purbolaksono, M. Irvan Tantowi, A. Imam Hidayat, and A. Adiwijaya, “Perbandingan Support Vector Machine dan Modified Balanced Random Forest dalam Deteksi Pasien Penyakit Diabetes,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 2, pp. 393–399, Apr. 2021, doi: 10.29207/resti.v5i2.3008.

G. A. B. Suryanegara, Adiwijaya, and M. D. Purbolaksono, “Peningkatan Hasil Klasifikasi pada Algoritma Random Forest untuk Deteksi Pasien Penderita Diabetes Menggunakan Metode Normalisasi,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 1, pp. 114–122, Feb. 2021, doi: 10.29207/resti.v5i1.2880.

R. Oktafiani and R. Rianto, “Perbandingan Algoritma Support Vector Machine (SVM) dan Decision Tree untuk Sistem Rekomendasi Tempat Wisata,” Jurnal Nasional Teknologi dan Sistem Informasi, vol. 9, no. 2, pp. 113–121, Aug. 2023, doi: 10.25077/TEKNOSI.v9i2.2023.113-121.

A. Rosyida and T. B. Sasongko, “Early Detection of Alzheimer’s Disease with the C4.5 Algorithm Based on BPSO (Binary Particle Swarm Optimization),” Jurnal Sisfokom (Sistem Informasi dan Komputer), vol. 12, no. 3, pp. 341–349, Nov. 2023, doi: 10.32736/sisfokom.v12i3.1716.

S. Rabbani, D. Safitri, N. Rahmadhani, A. A. F. Sani, and M. K. Anam, “Perbandingan Evaluasi Kernel SVM untuk Klasifikasi Sentimen dalam Analisis Kenaikan Harga BBM,” MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 3, no. 2, pp. 153–160, Oct. 2023, doi: 10.57152/malcom.v3i2.897.

R. Oktafiani, A. Hermawan, and D. Avianto, “Pengaruh Komposisi Split data Terhadap Performa Klasifikasi Penyakit Kanker Payudara Menggunakan Algoritma Machine Learning,” Jurnal Sains dan Informatika, pp. 19–28, Jun. 2023, doi: 10.34128/jsi.v9i1.622.

S. N. Safitri, Haryono Setiadi, and E. Suryani, “Educational Data Mining Using Cluster Analysis Methods and Decision Trees based on Log Mining,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 3, pp. 448–456, Jul. 2022, doi: 10.29207/resti.v6i3.3935.

A. Wibowo, S. Wardani, R. W. Dewantoro, W. Wesly, and Leonardo, “Komparasi Tingkat Akurasi Random Forestdan Decision TreeC4.5 Pada Klasifikasi Data Penyakit Infertilitas,” KLIK: Kajian Ilmiah Informatika dan Komputer, vol. 4, no. 1, pp. 218–224, Aug. 2023, doi: 10.30865/klik.v4i1.1115.

Gde Agung Brahmana Suryanegara, Adiwijaya, and Mahendra Dwifebri Purbolaksono, “Peningkatan Hasil Klasifikasi pada Algoritma Random Forest untuk Deteksi Pasien Penderita Diabetes Menggunakan Metode Normalisasi,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 5, no. 1, pp. 114–122, Feb. 2021, doi: 10.29207/resti.v5i1.2880.

S. Suparyati, Emma Utami, and Alva Hendi Muhammad, “Applying Different Resampling Strategies In Random Forest Algorithm To Predict Lumpy Skin Disease,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 4, pp. 555–562, Aug. 2022, doi: 10.29207/resti.v6i4.4147.

Fitra A. Bachtiar, Fajar Pradana, and Issa Arwani, “Klasifikasi Aktivitas Manusia Menggunakan Extreme Learning Machine dan Seleksi Fitur Information Gain,” Jurnal Nasional Teknik Elektro dan Teknologi Informasi, vol. 10, no. 3, pp. 189–195, Aug. 2021, doi: 10.22146/jnteti.v10i3.1451.

Y. Farida, N. Ulinnuha, S. K. Sari, and L. N. Desinaini, “Comparing Support Vector Machine and Naïve Bayes Methods with A Selection of Fast Correlation Based Filter Features in Detecting Parkinson’s Disease,” Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, vol. 14, no. 2, p. 80, Nov. 2023, doi: 10.24843/LKJITI.2023.v14.i02.p02.

R. Nuraini, A. Wibowo, B. Warsito, W. A. Syafei, and I. Jaya, “Combination of K-NN and PCA Algorithms on Image Classification of Fish Species,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 7, no. 5, pp. 1026–1032, Aug. 2023, doi: 10.29207/resti.v7i5.5178.

G. L. Pritalia, “Analisis Komparatif Algoritme Machine Learning dan Penanganan Imbalanced Data pada Klasifikasi Kualitas Air Layak Minum,” KONSTELASI: Konvergensi Teknologi dan Sistem Informasi, vol. 2, no. 1, Apr. 2022, doi: 10.24002/konstelasi.v2i1.5630.

M. Zhou, H. Zhang, W. Zhang, and Y. Yi, “An Improved Random Forest Algorithm-Based Fatigue Recognition With Multiphysical Feature,” IEEE Sens J, vol. 23, no. 21, pp. 26195–26201, Nov. 2023, doi: 10.1109/JSEN.2023.3314316.

B. P. Koya, S. Aneja, R. Gupta, and C. Valeo, “Comparative analysis of different machine learning algorithms to predict mechanical properties of concrete,” Mechanics of Advanced Materials and Structures, vol. 29, no. 25, pp. 4032–4043, Oct. 2022, doi: 10.1080/15376494.2021.1917021.

A. D. Patange, S. S. Pardeshi, R. Jegadeeshwaran, A. Zarkar, and K. Verma, “Augmentation of Decision Tree Model Through Hyper-Parameters Tuning for Monitoring of Cutting Tool Faults Based on Vibration Signatures,” Journal of Vibration Engineering & Technologies, Nov. 2022, doi: 10.1007/s42417-022-00781-9.

H. Zhang, L. Zhang, and Y. Jiang, “Overfitting and Underfitting Analysis for Deep Learning Based End-to-end Communication Systems,” in 2019 11th International Conference on Wireless Communications and Signal Processing (WCSP), IEEE, Oct. 2019, pp. 1–6. doi: 10.1109/WCSP.2019.8927876.

B. Juba and H. S. Le, “Precision-Recall versus Accuracy and the Role of Large Data Sets,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 4039–4048, Jul. 2019, doi: 10.1609/aaai.v33i01.33014039.

T. Lan, H. Hu, C. Jiang, G. Yang, and Z. Zhao, “A comparative study of decision tree, random forest, and convolutional neural network for spread-F identification,” Advances in Space Research, vol. 65, no. 8, pp. 2052–2061, Apr. 2020, doi: 10.1016/j.asr.2020.01.036.

R. Genuer and J.-M. Poggi, “Random Forests,” 2020, pp. 33–55. doi: 10.1007/978-3-030-56485-8_3.

N. A. Priyanka and D. Kumar, “Decision tree classifier: a detailed survey,” International Journal of Information and Decision Sciences, vol. 12, no. 3, p. 246, 2020, doi: 10.1504/IJIDS.2020.108141.

Published
2024-02-21
How to Cite
Rian Oktafiani, Arief Hermawan, & Donny Avianto. (2024). Max Depth Impact on Heart Disease Classification: Decision Tree and Random Forest. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 8(1), 160 - 168. https://doi.org/10.29207/resti.v8i1.5574
Section
Information Technology Articles