Comparison of Genetic Algorithm and Recursive Feature Elimination on High Dimensional Data

  • Yoga Pristyanto Universitas Amikom Yogyakarta
  • Dipa Wirantanu Universitas Amikom Yogyakarta
Keywords: feature selection, high-dimensional data, recursive feature elimination, genetic algorithm

Abstract

The use of big data in companies is currently used in file processing. With large capacity files, it can affect the performance in terms of time in the company, so to overcome the problem of high-dimensional data, feature selection is used in selecting the number of features. On the WDC dataset with 30 features and 569 data points, feature selection is performed using the Recusive Feature Elimination (RFE) and Genetic Algorithm (GA) models. Then, a comparison of evaluation values is made to determine which feature selection is best for solving the problem. From the 14 tables of evaluation results and discussion in tables 1 to 14, it is found that in the evaluation of accuracy and the use of weighted macros on precision, recall, and f1 score, using GA selection features has slightly higher results than RFE, so it is concluded that GA selection features are better at solving problems in high-dimensional data.

Downloads

Download data is not yet available.

References

F. Peng, H. Wang, L. Zhuang, M. Wang, and C. Yang, “Methods of enterprise electronic file content information mining under big data environment,” in 2020 International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), Oct. 2020, pp. 5–8. doi: 10.1109/ICBASE51474.2020.00008.

M. Lamrini and M. Y. Chkouri, “Decomposition and Visualization of High-Dimensional Data in a Two-Dimensional Interface,” in 2019 1st International Conference on Smart Systems and Data Science (ICSSD), Oct. 2019, pp. 1–5. doi: 10.1109/ICSSD47982.2019.9002846.

S. Ramjee and A. El Gamal, “Efficient Wrapper Feature Selection using Autoencoder and Model Based Elimination,” May 2019, [Online]. Available: http://arxiv.org/abs/1905.11592

A. N. M. B. Rashid, M. Ahmed, L. F. Sikos, and P. Haskell-Dowland, “A Novel Penalty-Based Wrapper Objective Function for Feature Selection in Big Data Using Cooperative Co-Evolution,” IEEE Access, vol. 8, pp. 150113–150129, 2020, doi: 10.1109/ACCESS.2020.3016679.

Z. Huang and D. Chen, “A Breast Cancer Diagnosis Method Based on VIM Feature Selection and Hierarchical Clustering Random Forest Algorithm,” IEEE Access, vol. 10, pp. 3284–3293, 2022, doi: 10.1109/ACCESS.2021.3139595.

F. S. Fogliatto, M. J. Anzanello, F. Soares, and P. G. Brust-Renck, “Decision Support for Breast Cancer Detection: Classification Improvement Through Feature Selection,” Cancer Control, vol. 26, no. 1, p. 107327481987659, Jan. 2019, doi: 10.1177/1073274819876598.

I. Cholissodin and A. A. Soebroto, “AI , MACHINE LEARNING & DEEP LEARNING ( Teori & Implementasi ),” no. December, 2021.

T. Almutiri and F. Saeed, “Chi Square and Support Vector Machine with Recursive Feature Elimination for Gene Expression Data Classification,” in 2019 First International Conference of Intelligent Computing and Engineering (ICOICE), Dec. 2019, pp. 1–6. doi: 10.1109/ICOICE48418.2019.9035165.

N. Jamshidpour, A. Safari, and S. Homayouni, “Multiview Active Learning Optimization Based on Genetic Algorithm and Gaussian Mixture Models for Hyperspectral Data,” IEEE Geosci. Remote Sens. Lett., vol. 17, no. 1, pp. 172–176, 2020, doi: 10.1109/LGRS.2019.2914858.

X. Ding, F. Yang, and F. Ma, “An efficient model selection for linear discriminant function-based recursive feature elimination,” J. Biomed. Inform., vol. 129, p. 104070, May 2022, doi: 10.1016/j.jbi.2022.104070.

L. Li, W.-K. Ching, and Z.-P. Liu, “Robust biomarker screening from gene expression data by stable machine learning-recursive feature elimination methods,” Comput. Biol. Chem., vol. 100, p. 107747, Oct. 2022, doi: 10.1016/j.compbiolchem.2022.107747.

P. R. Kannari, N. S. Chowdary, and R. Laxmikanth Biradar, “An anomaly-based intrusion detection system using recursive feature elimination technique for improved attack detection,” Theor. Comput. Sci., vol. 931, pp. 56–64, Sep. 2022, doi: 10.1016/j.tcs.2022.07.030.

W. Liu and J. Wang, “Recursive elimination current algorithms and a distributed computing scheme to accelerate wrapper feature selection,” Inf. Sci. (Ny)., vol. 589, pp. 636–654, Apr. 2022, doi: 10.1016/j.ins.2021.12.086.

U. Das, A. Y. Srizon, M. Al Mehedi Hasan, J. Rahman, and M. K. Ben Islam, “Effective Data Dimensionality Reduction Workflow for High-Dimensional Gene Expression Datasets,” in 2020 IEEE Region 10 Symposium (TENSYMP), 2020, pp. 182–185. doi: 10.1109/TENSYMP50017.2020.9230847.

N. Koul and S. S. Manvi, “Ensemble Feature Selection from Cancer Gene Expression Data using Mutual Information and Recursive Feature Elimination,” in 2020 Third International Conference on Advances in Electronics, Computers and Communications (ICAECC), Dec. 2020, pp. 1–6. doi: 10.1109/ICAECC50550.2020.9339518.

C. Peng, X. Wu, W. Yuan, X. Zhang, Y. Zhang, and Y. Li, “MGRFE: Multilayer Recursive Feature Elimination Based on an Embedded Genetic Algorithm for Cancer Classification,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 18, no. 2, pp. 621–632, Mar. 2021, doi: 10.1109/TCBB.2019.2921961.

J. Goyal, P. Khandnor, and T. C. Aseri, “Analysis of Parkinson’s disease diagnosis using a combination of Genetic Algorithm and Recursive Feature Elimination,” in 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4), Jul. 2020, pp. 268–272. doi: 10.1109/WorldS450073.2020.9210415.

Yoga Religia, Agung Nugroho, and Wahyu Hadikristanto, “Klasifikasi Analisis Perbandingan Algoritma Optimasi pada Random Forest untuk Klasifikasi Data Bank Marketing,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 1, pp. 187–192, 2021, doi: 10.29207/resti.v5i1.2813.

W. M. P.D. and Haryoko, “Optimization Of Parameter Support Vector Machine (SVM) using Genetic Algorithm to Review Go-Jek’s Services,” in 2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), Nov. 2019, pp. 301–304. doi: 10.1109/ICITISEE48480.2019.9003894.

L. Qiang, X. Zhijie, and Z. Zhisheng, “A Feature Selection Method Based on Variable Weight in Fault Isolation,” in 2021 International Conference on Computer, Control and Robotics (ICCCR), Jan. 2021, pp. 256–261. doi: 10.1109/ICCCR49711.2021.9349378.

O. Okun, Feature Selection and Ensemble Methods for Bioinformatics. IGI Global snippet, 2011. doi: 10.4018/978-1-60960-557-5.

D. Hand, “The top ten algorithms in data mining,” Chapman & Hall/CRC Press, 2009, pp. 163–177. doi: 10.1201/9781420089653.ch9.

Published
2024-03-29
How to Cite
Pristyanto, Y., & Wirantanu, D. (2024). Comparison of Genetic Algorithm and Recursive Feature Elimination on High Dimensional Data. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 8(2), 189 - 198. https://doi.org/10.29207/resti.v8i2.5375
Section
Information Systems Engineering Articles