Increasing the Accuracy of Brain Stroke Classification using Random Forest Algorithm with Mutual Information Feature Selection
Abstract
Brain stroke stands out as a leading cause of death, distinguishing it from common illnesses and highlighting the critical need to utilize machine learning techniques to identify symptoms. Among these techniques, the Random Forest (RF) algorithm emerged as the main candidate because of its optimal accuracy values. RF was chosen for its ensemble learning properties that optimize accuracy while simultaneously, bagging all outputs (DT), thus increasing its efficacy. Feature Selection, an important data analysis step, which is mainly achieved through pre-processing, aims to identify influential features and ignore less impactful features. Mutual Information serves as an important feature selection method. Specifically, the highest level of accuracy was achieved by cross-validating the test data - 10, resulting in 0.7760 without feature selection and 0.7790 with mutual information. Most of the attributes in the brain stroke dataset show relevance to the stroke disease class, but the resulting decision tree shows age as a particularly important node. So, the research results show that the selection feature (Mutual Information) can increase the accuracy of brain stroke classification, although it is not significant, namely an increase of 0.0030%. With an increase, where there is no significant difference, it can be said that almost all the attributes contained in the brain stroke dataset used have an influence on their relevance to the stroke disease class.
Downloads
References
[2] H. Kaur and V. Kumari, “Predictive modelling and analytics for diabetes using a machine learning approach,” Appl. Comput. Informatics, 2019, doi: 10.1016/j.aci.2018.12.004.
[3] H. Angga Yuwono, S. Kusuma Wijaya, and P. Prajitno, “Feature selection with Lasso for classification of ischemic strokes based on EEG signals,” J. Phys. Conf. Ser., vol. 1528, no. 1, 2020, doi: 10.1088/1742-6596/1528/1/012029.
[4] G. Fang, W. Liu, and L. Wang, “A machine learning approach to select features important to stroke prognosis,” Comput. Biol. Chem., vol. 88, no. June, p. 107316, 2020, doi: 10.1016/j.compbiolchem.2020.107316.
[5] S. Ray, K. Alshouiliy, A. Roy, A. Alghamdi, and D. P. Agrawal, “Chi-Squared Based Feature Selection for Stroke Prediction using AzureML,” 2020 Intermt. Eng. Technol. Comput. IETC 2020, 2020, doi: 10.1109/IETC47856.2020.9249117.
[6] T. G. Debelee, S. R. Kebede, F. Schwenker, and Z. M. Shewarega, “Deep Learning in Selected Cancers’ Image Analysis—A Survey,” J. Imaging, vol. 6, no. 11, pp. 1–40, 2020, doi: 10.3390/jimaging6110121.
[7] P. Narasimhaiah and C. Nagaraju, “Breast Cancer Screening Tool Using Gabor Filter-Based Ensemble Machine Learning Algorithms,” Int. J. Intell. Syst. Appl. Eng., vol. 11, no. 2, pp. 936–947, 2023.
[8] R. Tugay and Ş. G. Ögüdücü, “Demand prediction using machine learning methods and stacked generalization,” Proc. 6th Int. Conf. Data Sci. Technol. Appl., pp. 216–222, 2020, doi: 10.5220/0006431602160222.
[9] M. S. Sirsat, E. Fermé, and J. Câmara, “Machine Learning for Brain Stroke: A Review,” J. Stroke Cerebrovasc. Dis., vol. 29, no. 10, 2020, doi: 10.1016/j.jstrokecerebrovasdis.2020.105162.
[10] B. El Boudani et al., “Implementing deep learning techniques in 5g iot networks for 3d indoor positioning: Delta (deep learning-based co-operative architecture),” Sensors (Switzerland), vol. 20, no. 19, pp. 1–20, 2020, doi: 10.3390/s20195495.
[11] L. Buch and A. Andrzejak, “Learning-Based Recursive Aggregation of Abstract Syntax Trees for Code Clone Detection,” SANER 2019 - Proc. 2019 IEEE 26th Int. Conf. Softw. Anal. Evol. Reengineering, pp. 95–104, 2019, doi: 10.1109/SANER.2019.8668039.
[12] J. Tolan et al., “Sub-meter resolution canopy height maps using self-supervised learning and a vision transformer trained on Aerial and GEDI Lidar,” 2023, [Online]. Available: http://arxiv.org/abs/2304.07213.
[13] Y. H. Haw et al., “Classification of basal stem rot using deep learning: a review of digital data collection and palm disease classification methods,” PeerJ Comput. Sci., vol. 9, pp. 1–30, 2023, doi: 10.7717/PEERJ-CS.1325.
[14] Fachruddin, Y. Pratama, E. Rasywir, D. Kisbianty, Hendrawan, and M. R. Borroek, “Real Time Detection on Face Side Image with Ear Biometric Imaging Using Integral Image and Haar-Like Feature,” in Proceedings of 2018 International Conference on Electrical Engineering and Computer Science, ICECOS 2018, 2019, pp. 165–170, doi: 10.1109/ICECOS.2018.8605218.
[15] H. Saleh, S. Mostafa, A. Alharbi, S. El-Sappagh, and T. Alkhalifah, “Heterogeneous Ensemble Deep Learning Model for Enhanced Arabic Sentiment Analysis,” Sensors, vol. 22, no. 10, pp. 1–28, 2022, doi: 10.3390/s22103707.
[16] K. Park, J. S. Hong, and W. Kim, “A Methodology Combining Cosine Similarity with Classifier for Text Classification,” Appl. Artif. Intell., vol. 34, no. 5, pp. 396–411, 2020, doi: 10.1080/08839514.2020.1723868.
[17] P. S. Thakur, P. Khanna, T. Sheorey, and A. Ojha, “Explainable vision transformer enabled convolutional neural network for plant disease identification: PlantXViT,” no. Dl, 2022, [Online]. Available: http://arxiv.org/abs/2207.07919.
[18] M. Ju, H. Luo, Z. Wang, B. Hui, and Z. Chang, “The application of improved YOLO V3 in multi-scale target detection,” Appl. Sci., vol. 9, no. 18, 2019, doi: 10.3390/app9183775.
[19] G. Liu, J. C. Nouaze, P. L. T. Mbouembe, and J. H. Kim, “YOLO-tomato: A robust algorithm for tomato detection based on YOLOv3,” Sensors (Switzerland), vol. 20, no. 7, pp. 1–20, 2020, doi: 10.3390/s20072145.
[20] G. Neelakantam, D. D. Onthoni, and P. K. Sahoo, “Fog computing enabled locality based product demand prediction and decision making using reinforcement learning,” Electron., vol. 10, no. 3, pp. 1–16, 2021, doi: 10.3390/electronics10030227.
[21] L. Alzubaidi et al., Review of deep learning: concepts, CNN architectures, challenges, applications, future directions, vol. 8, no. 1. Springer International Publishing, 2021.
[22] F. Utami, S. Suhendri, and M. Abdul Mujib, “Implementasi Algoritma Haar Cascade pada Aplikasi Pengenalan Wajah,” J. Inf. Technol., vol. 3, no. 1, pp. 33–38, 2021, doi: 10.47292/joint.v3i1.45.
[23] M. Seyedan and F. Mafakheri, “Predictive big data analytics for supply chain demand forecasting: methods, applications, and research opportunities,” J. Big Data, vol. 7, no. 1, 2020, doi: 10.1186/s40537-020-00329-2.
[24] F.-Z. Nakach, A. Idri, and H. Zerouaoui, “Deep Hybrid Bagging Ensembles for Classifying Histopathological Breast Cancer Images,” in Proceedings ofthe 15th International Conference on Agents and Artificial Intelligence (ICAART2023) - Volume 2, pa, 2023, vol. 2, no. Icaart, pp. 289–300, doi: 10.5220/0011704200003393.
[25] H. Lee et al., “Machine Learning Approach to Identify Stroke Within 4.5 Hours,” Stroke, vol. 51, no. 3, pp. 860–866, 2020, doi: 10.1161/STROKEAHA.119.027611.
Copyright (c) 2024 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;