Comparative Analysis of Various Ensemble Algorithms for Computer Malware Prediction
Abstract
By 2022 it is estimated that 29 billion devices have been connected to the internet so that cybercrime will become a major threat. One of the most common forms of cybercrime is infection with malicious software (malware) designed to harm end users. Microsoft has the highest number of vulnerabilities among software companies, with the Microsoft operating system (Windows) contributing to the largest vulnerabilities at 68.85%. Malware infection research is mostly done when malware has infected a user's device. This study uses the opposite approach, which is to predict the potential for malware infection on the user's device before the infection occurs. Similar studies still use single algorithms, while this study uses ensemble algorithms that are more resistant to bias-variance trade-off. This study builds models from data on computer features that affect the possibility of malware infection on computer devices with Microsoft Windows operating system using ensemble algoritms, such as Bagging Classifier, Random Forest, Light Gradient Boosting Machine, Extreme Gradient Boosting Machine, Category Boosting, and Stacking Classifier. The best model is Stacking Classifier, which is a combination of Light Gradient Boosting Machine and Category Boosting Classifier, with training and test results of 0.70665 and 0.64694. Important features have also been identified as a reference for taking policies to protect user devices from malware infections.
Downloads
References
D. P. F. Möller, Cybersecurity in Digital Transformation: Scope and Applications. Cham: Springer International Publishing, 2020. doi: 10.1007/978-3-030-60570-4.
I. Pihir, K. Pupek, and M. Furjan, ‘Digital Transformation Insights and Trends’, in Proceedings of the Central European Conference on Information and Intelligent Systems, Croatia, Sep. 2018, pp. 141–149.
E. Indriasari, S. Supangkat, and R. Kosala, ‘Digital Transformation: IT Governance In The Agile Environment A Study Case Of Indonesia High Regulated Company’, International Journal of Scientific and Technology Research, vol. 9, no. 4, pp. 1557–1562, Apr. 2020.
M. Stamp, M. Alazab, and A. Shalaginov, Eds., Malware Analysis Using Artificial Intelligence and Deep Learning. Cham: Springer International Publishing, 2021. doi: 10.1007/978-3-030-62582-5.
K. Thakur and A.-S. K. Pathan, Cybersecurity fundamentals: a real-world perspective, First edition. Boca Raton, FL: CRC Press, 2020.
R. Das, Practical AI for cybersecurity. Florida: Auerbach Publications, 2021.
T. Rains, Cybersecurity Threats, Malware Trends, and Strategies. S.l.: Packt Publishing, 2020.
N. R. Pokhrel, H. Rodrigo, and C. P. Tsokos, ‘Cybersecurity: Time Series Predictive Modeling of Vulnerabilities of Desktop Operating System Using Linear and Non-Linear Approach’, JIS, vol. 08, no. 04, pp. 362–382, 2017, doi: 10.4236/jis.2017.84023.
D. Gibert, C. Mateu, and J. Planes, ‘The rise of machine learning for detection and classification of malware: Research developments, trends and challenges’, Journal of Network and Computer Applications, vol. 153, p. 102526, Mar. 2020, doi: 10.1016/j.jnca.2019.102526.
R. McCann et al., ‘Microsoft Malware Prediction’, Dec. 14, 2018. https://kaggle.com/c/microsoft-malware-prediction (accessed Nov. 13, 2021).
S. W. Knox, Machine learning: a concise introduction. Hoboken, New Jersey: John Wiley & Sons, 2018.
T. Amr, Hands-on machine learning with scikit-learn and scientific Python toolkits: a practical guide to implementing supervised and unsupervised machine learning algorithms in Python. Birmingham Mumbai: Packt, 2020.
R. Kumar, Machine Learning Quick Reference. Birmingham, UK: Packt Publishing, 2019.
M. Shahihi, R. Farhanian, and M. Ellis, ‘Machine Learning to Predict the Likelihood of a Personal Computer to Be Infected with Malware’, SMU Data Science Review, vol. 2, no. 2, p. 9, 2019.
Q. Pan, W. Tang, and S. Yao, ‘The Application of LightGBM in Microsoft Malware Detection’, J. Phys.: Conf. Ser., vol. 1684, p. 012041, Nov. 2020, doi: 10.1088/1742-6596/1684/1/012041.
M. Sokolov and N. Herndon, ‘Predicting Malware Attacks using Machine Learning and AutoAI’:, in Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods, 2021, pp. 295–301. doi: 10.5220/0010264902950301.
Copyright (c) 2023 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;