Pendekatan Machine Learning yang Efisien untuk Prediksi Kanker Payudara

  • Azminuddin I. S. Azis Universitas Ichsan Gorontalo
  • Irma Surya Kumala Idris Universitas Ichsan Gorontalo
  • Budy Santoso Universitas Ichsan Gorontalo
  • Yasin Aril Mustofa Universitas Ichsan Gorontalo
Keywords: machine learning, breast cancer prediction, missing value replacement, feature selection, unbalanced class


Breast Cancer is the most common cancer found in women and the death rate is still in second place among other cancers. The high accuracy of the machine learning approach that has been proposed by related studies is often achieved. However, without efficient pre-processing, the model of Breast Cancer prediction that was proposed is still in question. Therefore, this research objective to improve the accuracy of machine learning methods through pre-processing: Missing Value Replacement, Data Transformation, Smoothing Noisy Data, Feature Selection / Attribute Weighting, Data Validation, and Unbalanced Class Reduction which is more efficient for Breast Cancer prediction. The results of this study propose several approaches: C4.5 - Z-Score - Genetic Algorithm for Breast Cancer Dataset with 77,27% accuracy, 7-Nearest Neighbor - Min-Max Normalization - Particle Swarm Optimization for Wisconsin Breast Cancer Dataset - Original with 97,85% accuracy, Artificial Neural Network - Z-Score - Forward Selection for Wisconsin Breast Cancer Dataset - Diagnostics with 98,24% accuracy, and 11-Nearest Neighbor - Min-Max Normalization - Particle Swarm Optimization for Wisconsin Breast Cancer Dataset - Prognostic with 83,33% accuracy. The performance of these approaches is better than standard/normal machine learning methods and the proposed methods by the best of previous related studies.



Download data is not yet available.


National Breast Cancer Coalition, “BreastCancerDeadline2020,” 2017. [Online]. Available: [Accessed: 18-Sep-2018].

M. F. Akay, “Support Vector Machines Combined with Feature Selection for Breast Cancer Diagnosis,” Expert Syst. Appl., vol. 36, pp. 3240–3247, 2009.

H. Chen, B. Yang, J. Liu, and D. Liu, “A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis,” Expert Syst. Appl., vol. 38, no. 7, pp. 9014–9022, 2011.

H. Asri, H. Mousannif, H. Al Moatassime, and T. Noel, “Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis,” in Procedia Computer Science, 2016, vol. 83, pp. 1064–1069.

A. K. Dubey, U. Gupta, and S. Jain, “Breast Cancer Statistics and Prediction Methodology: A Systematic Review and Analysis,” Asian Pasific J. Cancer Prev., vol. 16, no. 10, pp. 4237–4245, 2015.

K. Polat and S. Guneş, “Breast Cancer Diagnosis Using Least Square Support Vector Machine,” Digit. Signal Process., vol. 17, no. 4, pp. 694–701, 2007.

S. Shah, “,” 2014. [Online]. Available: [Accessed: 18-Sep-2018]., “Pemerintah Targetkan 80% Perempuan dapat Deteksi Dini Kanker Payudara dan Kanker Serviks,” 2013. [Online]. Available: [Accessed: 18-Sep-2018].

International Agency for Research of Cancer, “Global Cancer Observatory,” 2018. [Online]. Available: [Accessed: 18-Sep-2018].

Y. Zhu, L. Zhou, S. Jiao, and L. Xu, “Relationship Between Soy Food Intake and Breast Cancer in China,” Asian Pacific J. Cancer Prev., vol. 12, no. 11, pp. 2837–2840, 2011.

A. Jemal, F. Bray, M. M. Center, J. Ferlay, E. Ward, and D. Forman, “Global Cancer Statistics,” CA. Cancer J. Clin., vol. 61, no. 2, pp. 69–90, 2011.

G. Carioli, M. Malvezzi, T. Rodriguez, P. Bertuccio, E. Negri, and C. La Vecchia, “Trends and predictions to 2020 in breast cancer mortality in Europe,” The Breast, vol. 36, pp. 89–95, 2017.

A. J. Vickers, “Prediction Models in Cancer Care,” CA. Cancer J. Clin., vol. 61, no. 5, pp. 315–326, 2011.

M. Viceconti, P. Hunter, and R. Hose, “Big data, big knowledge: big data for personalized healthcare,” in IEEE Journal of Biomedical and Health Informatics, 2015, vol. 19, no. 4, pp. 1209–1215.

P. H. Abreu, M. S. Santos, M. H. Abreu, B. Andrade, and D. C. Silva, “Predicting Breast Cancer Recurrence Using Machine Learning Tehniques: A Systematic Review,” ACM Comput. Surv., vol. 49, no. 3, pp. 52:1-52:40, 2016.

K. Kourou, T. P. Exarchos, K. P. Exarchos, M. V. Karamouzis, and D. I. Fotiadis, “Machine learning applications in cancer prognosis and prediction,” Comput. Struct. Biotechnol. J., vol. 13, pp. 8–17, 2015.

I. Kononenko, “Machine learning for medical diagnosis: history, state of the art and perspective,” Artif. Intell. Med., vol. 23, no. 1, pp. 89–109, 2001.

O. Gevaert, F. De Smet, D. Timmerman, Y. Moreau, and B. De Moor, “Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks,” Bioinformatics, vol. 22, no. 14, pp. 184–190, 2006.

G. S. Tewolde and D. M. Hanna, “Particle swarm optimization for classification of breast cancer data using single and multisurface methods of data separation,” in 2007 IEEE International Conference on Electro/Information Technology, 2007, pp. 443–446.

T. Ayer, O. Alagoz, J. Chhatwal, J. W. Shavlik, C. E. Kahn, and E. S. Burnside, “Breast Cancer Risk Estimation With Artificial Neural Networks Revisited,” Cancer, pp. 3310–3321, 2010.

D. Soria, J. M. Garibaldi, F. Ambrogi, E. M. Biganzoli, and I. O. Ellis, “A non-parametric version of the naive Bayes classifier,” Knowledge-Based Syst., vol. 24, no. 6, pp. 775–784, 2011.

H. I. Elshazly, N. I. Ghali, A. M. El Korany, and A. E. Hassanien, “Rough Sets and Genetic Algorithms: A hybrid approach to breast cancer classification,” in 2012 World Congress on Information and Communication Technologies, 2012, pp. 260–265.

M. Huang, Y. Hung, W. Lee, R. K. Li, and T. Wang, “Usage of Case-Based Reasoning, Neural Network and Adaptive Neuro-Fuzzy Inference System Classification Techniques in Breast Cancer Dataset Classification Diagnosis,” J. Med. Syst., vol. 36, no. 2, pp. 407–414, 2012.

W. Kim et al., “Development of Novel Breast Cancer Recurrence Prediction Model Using Support Vector Machine,” J. Breast Cancer, vol. 15, no. 2, pp. 230–238, 2012.

X. Xu, Y. Zhang, L. Zou, M. Wang, and A. Li, “A Gene Signature for Breast Cancer Prognosis Using Support Vector Machine,” in 2012 5th International Conference on BioMedical Engineering and Informatics, 2012, pp. 928–931.

H. Palivela, K. Patil, Y. H K, and V. S, “Survey On Mining Techniques For Breast Cancer Related Data,” in 2013 International Conference on Information Communication and Embedded Systems (ICICES), 2013.

K. Park, A. Ali, D. Kim, Y. An, M. Kim, and H. Shin, “Robust predictive model for evaluating breast cancer survivability,” Eng. Appl. Artif. Intell., vol. 26, no. 9, pp. 2194–2205, 2013.

J. Kim and H. Shin, “Breast cancer survivability prediction using labeled, unlabeled, and pseudo-labeled patient data,” J. Am. Med. Informatics Assoc., vol. 20, no. 4, pp. 613–618, 2013.

C. Park, J. Ahn, H. Kim, and S. Park, “Integrative Gene Network Construction to Analyze Cancer Recurrence Using Semi-Supervised Learning,” PLoS One, vol. 9, no. 1, pp. 1–9, 2014.

A. P. Pawlovsky and M. Nagahashi, “A Method to Select a Good Setting for the kNN Algorithm when Using it for Breast Cancer Prognosis,” in IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), 2014, pp. 189–192.

M. Seera and C. P. Lim, “A Hybrid Intelligent System for Medical Data Classification,” Expert Syst. Appl., vol. 41, no. 5, pp. 2239–2249, 2014.

G. D. Rashmi, A. Lekha, and N. Bawane, “Analysis of Efficiency of Classification and Prediction Algorithms (Naive Bayes) for Breast Cancer Dataset,” in 2015 International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT), 2015, pp. 108–113.

S. Bashir, U. Qamar, and F. H. Khan, “Heterogeneous classifiers fusion for dynamic breast cancer diagnosis using weighted vote based ensemble,” Qual. Quant., vol. 49, no. 5, pp. 2061–2076, 2015.

M. Khademi and N. S. Nedialkov, “Probabilistic Graphical Models and Deep Belief Networks for Prognosis of Breast Cancer,” in 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), 2015, pp. 727–732.

S. Muthuselvan, K. S. Sundaram, and Prabasheela, “Prediction of Breast Cancer Using Classification Rule Mining Techniques in Blood Test Datasets,” in 2016 International Conference on Information Communication and Embedded Systems (ICICES), 2016.

A. I. Pritom, A. R. Munshi, S. A. Sabab, and S. Shihab, “Predicting Breast Cancer Recurrence using Effective Classification and Feature Selection Technique,” in 2016 19th International Conference on Computer and Information Technology (ICCIT), 2016, pp. 310–314.

M. R. Mohebian, H. R. Marateb, M. Mansourian, M. A. Mananas, and F. Mokarian, “A Hybrid Computer-aided-diagnosis System for Prediction of Breast Cancer Recurrence (HPBCR) Using Optimized Ensemble Learning,” Comput. Struct. Biotechnol. J., vol. 15, pp. 75–85, 2017.

D. Verma and N. Mishra, “Comparative Analysis of Breast Cancer and Hypothyroid Dataset using Data Mining Classification Techniques,” in 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI), 2017, pp. 1624–1626.

D. Verma and N. Mishra, “Analysis and Prediction of Breast cancer and Diabetes disease datasets using Data mining classification Techniques,” in 2017 International Conference on Intelligent Sustainable Systems (ICISS), 2017, pp. 533–538.

V. Chaurasia and S. Pal, “A Novel Approach for Breast Cancer Detection using Data Mining Techniques,” Int. J. Innov. Res. Comput. Commun. Eng., vol. 2, no. 1, 2017.

D. E. Gbenga, N. Christopher, and D. C. Yetunde, “Performance Comparison of Machine Learning Techniques for Breast Cancer Detection,” Nov. J. Eng. Appl. Sci., vol. 6, no. 1, pp. 1–8, 2017.

H. Wang, B. Zheng, S. W. Yoon, and H. S. Ko, “A Support Vector Machine-Based Ensemble Algorithm for Breast Cancer Diagnosis,” Eur. J. Oper. Res., vol. 267, no. 2, pp. 687–699, 2018.

M. Vazifehdan, M. H. Moattar, and M. Jalali, “A hybrid Bayesian network and tensor factorization approach for missing value imputation to improve breast cancer recurrence prediction,” J. King Saud Univ. - Comput. Inf. Sci., 2018.

K. Liu, G. Kang, N. Zhang, and B. Hou, “Breast Cancer Classification Based on Fully-Connected Layer First Convolutional Neural Networks,” IEEE Access, vol. 6, pp. 23722–23732, 2018.

A. K. Dubey, U. Gupta, and S. Jain, “Comparative Study of K-means and Fuzzy C-means Algorithms on The Breast Cancer Data,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 8, no. 1, p. 18, 2018.

V. Chaurasia, S. Pal, and B. Tiwari, “Prediction of Benign and Malignant Breast Cancer using Data Mining Techniques,” J. Algorithm. Comput. Technol., vol. 12, no. 2, pp. 119–126, 2018.

M. Amrane, S. Oukid, I. Gagaoua, and T. Ensari, “Breast Cancer Classification Using Machine Learnin,” in Electric Electronics, Computer Science, Biomedical Engineerings’ Meeting (EBBT), 2018.

S. Sakri, N. A. Rashid, and Z. M. Zain, “Particle Swarm Optimization Feature Selection for Breast Cancer Recurrence Prediction,” IEEE Access, vol. 6, pp. 29637–29647, 2018.

B. Tamilvanan and V. M. Bhaskaran, “An Efficient Classifications Model for Breast Cancer Prediction Based on Dimensionality Reduction Techniques,” Int. J. Adv. Res. Comput. Sci., vol. 9, no. 1, pp. 448–456, 2018.

H. Lu, H. Wang, and S. W. Yoon, “A Dynamic Gradient Boosting Machine Using Genetic Optimizer for Practical Breast Cancer Prognosis,” Expert Syst. Appl., 2018.

I. Fakhruzi, “An Artificial Neural Network with Bagging to Address Imbalance Datasets on Clinical Prediction,” in 2018 International Conference on Information and Communications Technology (ICOIACT), 2018, no. 1, pp. 895–898.

J. Alwidian, B. H. Hammo, and N. Obeid, “WCBA : Weighted classification based on association rules algorithm for breast cancer disease,” Appl. Soft Comput. J., vol. 62, pp. 536–549, 2018.

A. Joshi and A. Mehta, “Analysis of K-Nearest Neighbor Technique for Breast Cancer Disease Classification,” Int. J. Recent Sci. Res., vol. 9, no. I, pp. 26126–26130, 2018.

K. Goyal, P. Sodhi, P. Aggarwal, and M. Kumar, “Comparative Analysis of Machine Learning Algorithms for Breast Cancer Prognosis,” in Proceedings of 2nd International Conference on Communication, Computing and Networking, 2018, p. 727.734.

G. E. A. P. A. Batista and M. C. Monard, “An analysis of four missing data treatment methods for supervised learning,” Appl. Artif. Intell., vol. 17, no. 5–6, pp. 519–533, 2003.

M. M. Suarez-Alvarez, D.-T. Pham, M. Y. Prostov, and Y. I. Prostov, “Statistical approach to normalization of feature vectors and clustering of mixed datasets,” in Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2012, vol. 468, no. 2145, pp. 2630–2651.

S. Kotsiantis and D. Kanellopoulos, “Discretization Techniques : A Recent Survey,” GESTS Int. Trans. Comput. Sci. Eng., vol. 32, no. 1, pp. 47–58, 2006.

A. Ali, S. M. Shamsuddin, and A. L. Ralescu, “Classification with class imbalance problem: A Review,” Int. J. Adv. Soft Comput. its Appl., vol. 7, no. 3, pp. 176–204, 2015.

M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches,” IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., vol. 42, no. 4, pp. 463–484, 2012.

L. Rokach, “Ensemble-based classifiers,” Artif. Intell. Rev., vol. 33, no. 1–2, pp. 1–39, 2010.

M. S. Santos, P. H. Abreu, P. J. Garcia-Laencina, A. Simao, and A. Carvalho, “A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients,” J. Biomed. Inform., vol. 58, pp. 49–59, 2015.

X. Wu et al., “Top 10 algorithms in data mining,” Knowl. Inf. Syst., vol. 14, no. 1, pp. 1–37, 2008.

J.-L. Bouchota, W. L. Trimble, G. Ditzler, Y. Lan, S. Essinger, and G. Rosen, “Advances in Machine Learning for Processing and Comparison of Metagenomic Data,” in Computational Systems Biology (Second Edition), Science Direct, 2014.

L. Xie, Z. Fu, W. Feng, and Y. Luo, “Pitch-density-based features and an SVM binary tree approach for multi-class audio classification in broadcast news,” Multimed. Syst., vol. 17, no. 2, pp. 101–112, 2011.

H. Le and L. Tran, “Automatic feature selection for named entity recognition using genetic algorithm,” in Proceedings of the Fourth Symposium on Information and Communication Technology, 2013.

Kennedy, Eberhart, and Shi, Swarm Intelligence. Morgan Kaufmann division of Academic Press, 2001.

E. Prasetyo, Data Mining: Konsep dan Aplikasinya Menggunakan Matlab. Yogyakarta, Indonesia: Andi Offset, 2012.

T. Sutojo, E. Mulyanto, and V. Suhartono, Kecerdasan Buatan. Yogyakarta, Indonesia: Andi Offset, 2011.

How to Cite
Azis, A. I. S., Irma Surya Kumala Idris, Budy Santoso, & Yasin Aril Mustofa. (2019). Pendekatan Machine Learning yang Efisien untuk Prediksi Kanker Payudara. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 3(3), 458 - 469.
Information Technology Articles