Data Mining Techniques for Predictive Classification of Anemia Disease Subtypes

Keywords: anemia, data mining, J48 decision tree, naive bayes, random forest


Anemia, characterized by insufficient red blood cells or reduced hemoglobin, hinders oxygen transport in the body. Understanding the various types of anemia is vital to tailor effective prevention and treatment. This research explores data mining's role in predicting and classifying anemia types, emphasizing Complete Blood Count (CBC) and demographic data. Data mining is key to building models that aid healthcare professionals in the diagnosis and treatment of anemia. Employing the Cross-Industry Standard Process for Data Mining (CRISP-DM), with its six phases, facilitates this endeavour. Our study compared Naïve Bayes, J48 Decision Tree, and Random Forest algorithms using RapidMiner's tools, evaluating accuracy, mean recall, and mean precision. The J48 Decision Tree outperformed the others, highlighting the importance of algorithm choice in anemia classification models. Furthermore, our analysis identified renal disease-related and chronic anemia as the most prevalent types, with a higher incidence among women. Recognizing gender disparities in the prevalence of anemia informs personalized healthcare decisions. Understanding demographic factors in specific types of anemia is crucial for effective care strategies.


Download data is not yet available.

Author Biography

Johan Setiawan, Universitas Multimedia Nusantara

Information Systems Study, Faculty of Engineering and Informatics
Univesitas Multimedia Nusantara


C. M. Chaparro and P. S. Suchdev, “Anemia epidemiology, pathophysiology, and aetiology in low- and middle-income countries,” Ann. N. Y. Acad. Sci., vol. 1450, no. 1, pp. 15–31, 2019, doi: 10.1111/nyas.14092.

L. I. Vázquez, E. Valera, M. Villalobos, M. Tous, and V. Arija, “Prevalence of anaemia in children from latin america and the Caribbean and effectiveness of nutritional interventions: Systematic review and meta–analysis,” Nutrients, vol. 11, no. 1. 2019, doi 10.3390/nu11010183.

R. Suryanarayana, M. Chandrappa, A. Santhuram, S. Prathima, and S. Sheela, “Prospective study on the prevalence of anaemia of pregnant women and its outcome: A community-based study,” J. Fam. Med. Prim. Care, vol. 6, no. 4, 2017, doi: 10.4103/jfmpc.jfmpc_33_17.

R. 2018, “Kemenkes RI. Hasil Riset Kesehatan Dasar Tahun 2018. Kementrian Kesehat RI.,” J. Kesehat. Saintika Meditory, vol. 1, no. 1, 2018.

W. Liu and H. Li, “COVID-19: attacks the I-beta chain of haemoglobin and captures the porphyrin to inhibit human heme metabolism,” ChemRxiv, vol. 12, no. 1, p. 31, 2020.

M. Jaiswal, A. Srivastava, and T. J. Siddiqui, “Machine learning algorithms for anaemia disease prediction,” in Lecture Notes in Electrical Engineering, 2019, vol. 524, pp. 463–469, doi: 10.1007/978-981-13-2685-1_44.

M. G. Tuck, F. Alemi, J. F. Shortle, S. Avramovic, and C. Hesdorffer, “A comprehensive index for predicting risk of anaemia from patients’ Diagnoses,” Big Data, vol. 5, no. 1, 2017, doi: 10.1089/big.2016.0073.

F. Akter, M. A. Hossin, G. M. Daiyan, and M. M. Hossain, “Classification of Hematological Data Using Data Mining Technique to Predict Diseases,” J. Comput. Commun., vol. 06, no. 04, 2018, doi: 10.4236/jcc.2018.64007.

A. K. Ramotra, A. Mahajan, R. Kumar, and V. Mansotra, “Comparative Analysis of Data Mining Classification Techniques for Prediction of Heart Disease Using the Weka and SPSS Modeler Tools,” in Smart Innovation, Systems and Technologies, 2020, vol. 165, doi: 10.1007/978-981-15-0077-0_10.

R. Abd Rahman, I. B. Idris, Z. M. Isa, R. A. Rahman, and Z. A. Mahdy, “The Prevalence and Risk Factors of Iron Deficiency Anemia Among Pregnant Women in Malaysia: A Systematic Review,” Frontiers in Nutrition, vol. 9. 2022, doi: 10.3389/fnut.2022.847693.


K. Meena, D. K. Tayal, V. Gupta, and A. Fatima, “Using classification techniques for statistical analysis of Anemia,” Artif. Intell. Med., vol. 94, 2019, doi: 10.1016/j.artmed.2019.02.005.

S. Priya, G. Tripathi, D. B. Singh, P. Jain, and A. Kumar, “Machine learning approaches and their applications in drug discovery and design,” Chemical Biology and Drug Design, vol. 100, no. 1. 2022, doi: 10.1111/cbdd.14057.

D. Papakyriakou and I. S. Barbounakis, “Data Mining Methods: A Review,” Int. J. Comput. Appl., vol. 183, no. 48, pp. 5–19, 2022, doi: 10.5120/ijca2022921884.

S. Anwar Lashari, R. Ibrahim, N. Senan, and N. S. A. M. Taujuddin, “Application of Data Mining Techniques for Medical Data Classification: A Review,” in MATEC Web of Conferences, 2018, vol. 150, doi: 10.1051/matecconf/201815006003.

D. M. W. Powers, “Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation,” no. January 2008, 2020.

R. S. Oetama, Y. Heryadi, Lukas, and W. Suparta, “Improving Candle Direction Classification in Forex Market Using Support Vector Machine with Hyperparameters Tuning,” in 2022 7th International Conference on Informatics and Computing, ICIC 2022, 2022, doi: 10.1109/ICIC56845.2022.10006993.

D. A. Kristiyanti, R. Aulianita, D. A. Putri, L. A. Utami, F. Agustini, and Z. I. Alfianti, “Sentiment Classification Twitter of LRT, MRT, and Transjakarta Transportation using Support Vector Machine,” in 2022 International Conference of Science and Information Technology in Smart Administration, ICSINTESA 2022, 2022, doi: 10.1109/ICSINTESA56431.2022.10041651.

S. Nazari Nezhad, M. H. Zahedi, and E. Farahani, “Detecting diseases in medical prescriptions using data mining methods,” BioData Min., vol. 15, no. 1, 2022, doi: 10.1186/s13040-022-00314-w.

“Complete Blood Count (CBC),” 2019, 2019. .

S. A. King et al., “Search and Selection Procedures of Literature Reviews in Behavior Analysis,” Perspect. Behav. Sci., vol. 43, no. 4, 2020, doi: 10.1007/s40614-020-00265-9.

M. Abdullah and S. Al-Asmari, “Anemia types prediction based on data mining classification algorithms,” in Communication, Management and Information Technology - Proceedings of the International Conference on Communication, Management and Information Technology, ICCMIT 2016, 2017.

M. Jaiswal, A. Srivastava, and T. J. Siddiqui, “Machine learning algorithms for anaemia disease prediction,” Lect. Notes Electr. Eng., vol. 524, no. April, pp. 463–469, 2019, doi: 10.1007/978-981-13-2685-1_44.

M. N. Amin and A. Habib, “Comparison of Different Classification Techniques Using WEKA for Hematological Data,” Am. J. Eng. Res., no. 43, 2015.

C. Schröer, F. Kruse, and J. M. Gómez, “A systematic literature review on applying CRISP-DM process model,” Procedia Comput. Sci., vol. 181, no. 2019, pp. 526–534, 2021, doi: 10.1016/j.procs.2021.01.199

How to Cite
Setiawan, J., Amalia, D., & Prasetiawan, I. (2024). Data Mining Techniques for Predictive Classification of Anemia Disease Subtypes. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 8(1), 10 - 17.
Information Systems Engineering Articles