Comparison of Genetic Algorithm and Recursive Feature Elimination on High Dimensional Data
Abstract
The use of big data in companies is currently used in file processing. With large capacity files, it can affect the performance in terms of time in the company, so to overcome the problem of high-dimensional data, feature selection is used in selecting the number of features. On the WDC dataset with 30 features and 569 data points, feature selection is performed using the Recusive Feature Elimination (RFE) and Genetic Algorithm (GA) models. Then, a comparison of evaluation values is made to determine which feature selection is best for solving the problem. From the 14 tables of evaluation results and discussion in tables 1 to 14, it is found that in the evaluation of accuracy and the use of weighted macros on precision, recall, and f1 score, using GA selection features has slightly higher results than RFE, so it is concluded that GA selection features are better at solving problems in high-dimensional data.
Downloads
References
F. Peng, H. Wang, L. Zhuang, M. Wang, and C. Yang, “Methods of enterprise electronic file content information mining under big data environment,” in 2020 International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), Oct. 2020, pp. 5–8. doi: 10.1109/ICBASE51474.2020.00008.
M. Lamrini and M. Y. Chkouri, “Decomposition and Visualization of High-Dimensional Data in a Two-Dimensional Interface,” in 2019 1st International Conference on Smart Systems and Data Science (ICSSD), Oct. 2019, pp. 1–5. doi: 10.1109/ICSSD47982.2019.9002846.
S. Ramjee and A. El Gamal, “Efficient Wrapper Feature Selection using Autoencoder and Model Based Elimination,” May 2019, [Online]. Available: http://arxiv.org/abs/1905.11592
A. N. M. B. Rashid, M. Ahmed, L. F. Sikos, and P. Haskell-Dowland, “A Novel Penalty-Based Wrapper Objective Function for Feature Selection in Big Data Using Cooperative Co-Evolution,” IEEE Access, vol. 8, pp. 150113–150129, 2020, doi: 10.1109/ACCESS.2020.3016679.
Z. Huang and D. Chen, “A Breast Cancer Diagnosis Method Based on VIM Feature Selection and Hierarchical Clustering Random Forest Algorithm,” IEEE Access, vol. 10, pp. 3284–3293, 2022, doi: 10.1109/ACCESS.2021.3139595.
F. S. Fogliatto, M. J. Anzanello, F. Soares, and P. G. Brust-Renck, “Decision Support for Breast Cancer Detection: Classification Improvement Through Feature Selection,” Cancer Control, vol. 26, no. 1, p. 107327481987659, Jan. 2019, doi: 10.1177/1073274819876598.
I. Cholissodin and A. A. Soebroto, “AI , MACHINE LEARNING & DEEP LEARNING ( Teori & Implementasi ),” no. December, 2021.
T. Almutiri and F. Saeed, “Chi Square and Support Vector Machine with Recursive Feature Elimination for Gene Expression Data Classification,” in 2019 First International Conference of Intelligent Computing and Engineering (ICOICE), Dec. 2019, pp. 1–6. doi: 10.1109/ICOICE48418.2019.9035165.
N. Jamshidpour, A. Safari, and S. Homayouni, “Multiview Active Learning Optimization Based on Genetic Algorithm and Gaussian Mixture Models for Hyperspectral Data,” IEEE Geosci. Remote Sens. Lett., vol. 17, no. 1, pp. 172–176, 2020, doi: 10.1109/LGRS.2019.2914858.
X. Ding, F. Yang, and F. Ma, “An efficient model selection for linear discriminant function-based recursive feature elimination,” J. Biomed. Inform., vol. 129, p. 104070, May 2022, doi: 10.1016/j.jbi.2022.104070.
L. Li, W.-K. Ching, and Z.-P. Liu, “Robust biomarker screening from gene expression data by stable machine learning-recursive feature elimination methods,” Comput. Biol. Chem., vol. 100, p. 107747, Oct. 2022, doi: 10.1016/j.compbiolchem.2022.107747.
P. R. Kannari, N. S. Chowdary, and R. Laxmikanth Biradar, “An anomaly-based intrusion detection system using recursive feature elimination technique for improved attack detection,” Theor. Comput. Sci., vol. 931, pp. 56–64, Sep. 2022, doi: 10.1016/j.tcs.2022.07.030.
W. Liu and J. Wang, “Recursive elimination current algorithms and a distributed computing scheme to accelerate wrapper feature selection,” Inf. Sci. (Ny)., vol. 589, pp. 636–654, Apr. 2022, doi: 10.1016/j.ins.2021.12.086.
U. Das, A. Y. Srizon, M. Al Mehedi Hasan, J. Rahman, and M. K. Ben Islam, “Effective Data Dimensionality Reduction Workflow for High-Dimensional Gene Expression Datasets,” in 2020 IEEE Region 10 Symposium (TENSYMP), 2020, pp. 182–185. doi: 10.1109/TENSYMP50017.2020.9230847.
N. Koul and S. S. Manvi, “Ensemble Feature Selection from Cancer Gene Expression Data using Mutual Information and Recursive Feature Elimination,” in 2020 Third International Conference on Advances in Electronics, Computers and Communications (ICAECC), Dec. 2020, pp. 1–6. doi: 10.1109/ICAECC50550.2020.9339518.
C. Peng, X. Wu, W. Yuan, X. Zhang, Y. Zhang, and Y. Li, “MGRFE: Multilayer Recursive Feature Elimination Based on an Embedded Genetic Algorithm for Cancer Classification,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 18, no. 2, pp. 621–632, Mar. 2021, doi: 10.1109/TCBB.2019.2921961.
J. Goyal, P. Khandnor, and T. C. Aseri, “Analysis of Parkinson’s disease diagnosis using a combination of Genetic Algorithm and Recursive Feature Elimination,” in 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4), Jul. 2020, pp. 268–272. doi: 10.1109/WorldS450073.2020.9210415.
Yoga Religia, Agung Nugroho, and Wahyu Hadikristanto, “Klasifikasi Analisis Perbandingan Algoritma Optimasi pada Random Forest untuk Klasifikasi Data Bank Marketing,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 1, pp. 187–192, 2021, doi: 10.29207/resti.v5i1.2813.
W. M. P.D. and Haryoko, “Optimization Of Parameter Support Vector Machine (SVM) using Genetic Algorithm to Review Go-Jek’s Services,” in 2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), Nov. 2019, pp. 301–304. doi: 10.1109/ICITISEE48480.2019.9003894.
L. Qiang, X. Zhijie, and Z. Zhisheng, “A Feature Selection Method Based on Variable Weight in Fault Isolation,” in 2021 International Conference on Computer, Control and Robotics (ICCCR), Jan. 2021, pp. 256–261. doi: 10.1109/ICCCR49711.2021.9349378.
O. Okun, Feature Selection and Ensemble Methods for Bioinformatics. IGI Global snippet, 2011. doi: 10.4018/978-1-60960-557-5.
D. Hand, “The top ten algorithms in data mining,” Chapman & Hall/CRC Press, 2009, pp. 163–177. doi: 10.1201/9781420089653.ch9.
Copyright (c) 2024 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;