Perbandingan CART dan Random Forest untuk Deteksi Kanker berbasis Klasifikasi Data Microarray
Abstract
Cancer is one of the deadliest diseases in the world with a mortality rate of 57,3% in 2018 in Asia. Therefore, early diagnosis is needed to avoid an increase in mortality caused by cancer. As machine learning develops, cancer gene data can be processed using microarrays for early detection of cancer outbreaks. But the problem that microarray has is the number of attributes that are so numerous that it is necessary to do dimensional reduction. To overcome these problems, this study used dimensions reduction Discrete Wavelet Transform (DWT) with Classification and Regression Tree (CART) and Random Forest (RF) as classification method. The purpose of using these two classification methods is to find out which classification method produces the best performance when combined with the DWT dimension reduction. This research use five microarray data, namely Colon Tumors, Breast Cancer, Lung Cancer, Prostate Tumors and Ovarian Cancer from Kent-Ridge Biomedical Dataset. The best accuracy obtained in this study for breast cancer data were 76,92% with CART-DWT, Colon Tumors 90,1% with RF-DWT, lung cancer 100% with RF-DWT, prostate tumors 95,49% with RF-DWT, and ovarian cancer 100% with RF-DWT. From these results it can be concluded that RF-DWT is better than CART-DWT.
Downloads
References
International Agency for Research on Cancer, 2019. Cancer Tomorrow. [Online] (Updated March 2019) Tersedia di: https://gco.iarc.fr/tomorrow/home [Accessed 14 May 2020]
International Agency for Research on Cancer, 2019. All Cancer Fact Sheet. [Online] (Updated March 3019) Tersedia di: https://gco.iarc.fr/today/data/factsheets/cancers/39-All-cancers-fact-sheet.pdf [Accessed 14 May 2020]
Bennet, J., Chilambuchelvan A. G. and Kannan A., 2014. A Discrete Wavelet based feature extraction and Hybrid Classification technique for Microarray data analysis. Scientific World Journal, Hindawi publishing corporation, vol. Article ID 195470, no. 9, 2014. doi: https://doi.org/10.1155/2014/195470.
Adiwijaya, Wisesty UN, et al., 2018. Dimensionality Reduction using Principal Component Analysis for Cancer Detection based on Microarray Data Classification. Journal of Computer Science, 14(11), pp. 1521-1530. doi: 10.3844/jcssp.2018.1521.1530.
Adiwijaya, 2018. Deteksi Kanker Berdasarkan Klasifikasi Microarray Data. Jurnal Media Informatika Budidarma, 2(4), pp. 181-186. doi: http://dx.doi.org/10.30865/mib.v2i4.1043.
Aydadenta, H, 2018. A Clustering Approach for Feature Selection in Microarray Data Classification Using Random Forest. Journal of Information Processing Systems, 14(5), pp. 1167-1175. doi: https://doi.org/10.3745/JIPS.04.0087.
CHEN, Lei, and Yi-hui LIU, 2011. Classification based on CART algorithm for microarray data of lung cancer. China Journal of Bioinformatics 3, 9(3), pp. 229-234.
Khoirunnisa A, Rohmawati AA., 2019. Implementing Principal Component Analysis and Multinomial Logit for Cancer Detection based on Microarray Data Classification, In 2019 7th International Conference on Information and Communication Technology (ICoICT), pp. 1-6, IEEE. Kuala Lumpur, Malaysia, 2019 Jul 24. doi : 10.1109/ICoICT.2019.8835320.
Khadijah K., Hartati S., 2015. Klasifikasi Data Microarray Menggunakan Discrete Wavelet Transform dan Extreme Learning Machine. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 9(1), pp. 33-42. doi: https://doi.org/10.22146/ijccs.6638.
Sari, P. K., and Purwadinata, A., 2019. Analysis Characteristics of Car Sales In E-Commerce Data Using Clustering Model. Journal of Data Science and Its Applications, 2(1), pp. 19-28. doi: https://doi.org/10.21108/jdsa.2019.2.19.
Daeli, N. O. F., & Adiwijaya, A., 2020. Sentiment Analysis on Movie Reviews using Information Gain and K-Nearest Neighbor. Journal of Data Science and Its Applications, 3(1), pp. 1-7. doi: https://doi.org/10.34818/jdsa.2020.3.22.
Pratiwi, Melati Suci, and Annisa Aditsania, 2018. Cancer Detection Based on Microarray Data Classification using Genetic Bee Colony (GBC) and Conjugate Gradient Backpropagation with Modified Polak Ribiere (MBP-CGP). 2018 International Conference on Computer, Control, Informatics and its Applications (IC3INA). IEEE, pp. 163-168. doi: 10.1109/IC3INA.2018.8629538
Misiti, M., Misiti Y., Oppenheim G., and Poggi J.M., 2012. Wavelet Toolbox User's Guide (R2012a), Natick, Mass,USA: The MathWorks.
Rohmawati, Aniq A., A Daubechies wavelet transformation to optimize modeling calibration of active compound on drug plants. In 2017 5th International Conference on Information and Communication Technology (ICoIC7). pp. 1-4, IEEE. Malacca, Malaysia, 2017 May 17-19. doi: 10.1109/ICoICT.2017.8074666.
Fugal, D. L., 2009. Conceptual wavelets in digital signal processing: an in-depth, practical approach for the non-mathematician. CA, San Diego: Space & Signals Technical Pub, pp. 1-78.
Mandala, S., Cai Di, T., and Sunar, M. S., 2020. ECG-based prediction algorithm for imminent malignant ventricular arrhythmias using decision tree. Plos one, 15(5), pp. e0231635. doi: https://doi.org/10.1371/journal.pone.0231635.
Mabarti, I., 2020. Implementation of Minimum Redundancy Maximum Relevance (MRMR) and Genetic Algorithm (GA) for Microarray Data Classification with C4. 5 Decision Tree. Journal of Data Science and Its Applications, 3(1), pp. 38-47. doi: https://doi.org/10.34818/jdsa.2020.3.37.
Timofeev, R., 2004. Classification and regression trees (cart) theory and applications. M.A. Berlin: Humboldt University of Berlin.
Cutler, A., and Stevens, J. R., 2006. [23] random forests for microarrays. Methods in enzymology, 411, pp. 422-432. doi: https://doi.org/10.1016/S0076-6879(06)11023-X.
Purnomoputra, R. B., Adiwijaya, A., and Wisesty, U. N., 2019. Sentiment Analysis of Movie Review using Naïve Bayes Method with Gini Index Feature Selection. Journal of Data Science and Its Applications, 2(2), pp. 85-94. doi: https://doi.org/10.34818/jdsa.2019.2.36.
Xia, Y., Liu, C., Li, Y., and Liu, N., 2017. A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Systems with Applications, 78, pp. 225–241. https://doi.org/10.1016/j.eswa.2017.02.017
Zhang, Y., and Gao, J., 2017. MLFSdel: An accurate approach to discover genome deletions. In 2017 5th International Conference on Machinery, Materials and Computing Technology (ICMMCT 2017), Atlantis Press. Beijing, China, 2017 March 25-26.
Agusta, Z.P., Adiwijaya, 2019. Modified balanced random forest for improving imbalanced data prediction. International Journal of Advances in Intelligent Informatics, 5(1), pp.58-65.
Copyright (c) 2020 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;