Comparative Analysis of Various Ensemble Algorithms for Computer Malware Prediction

Yusuf Bayu Wicaksono; Christina Juliane

doi:10.29207/resti.v7i3.4492

Yusuf Bayu Wicaksono STMIK LIKMI
Christina Juliane STMIK LIKMI

DOI: https://doi.org/10.29207/resti.v7i3.4492

Keywords: malware, machine learning, ensemble algorithm, important feature

Abstract

By 2022 it is estimated that 29 billion devices have been connected to the internet so that cybercrime will become a major threat. One of the most common forms of cybercrime is infection with malicious software (malware) designed to harm end users. Microsoft has the highest number of vulnerabilities among software companies, with the Microsoft operating system (Windows) contributing to the largest vulnerabilities at 68.85%. Malware infection research is mostly done when malware has infected a user's device. This study uses the opposite approach, which is to predict the potential for malware infection on the user's device before the infection occurs. Similar studies still use single algorithms, while this study uses ensemble algorithms that are more resistant to bias-variance trade-off. This study builds models from data on computer features that affect the possibility of malware infection on computer devices with Microsoft Windows operating system using ensemble algoritms, such as Bagging Classifier, Random Forest, Light Gradient Boosting Machine, Extreme Gradient Boosting Machine, Category Boosting, and Stacking Classifier. The best model is Stacking Classifier, which is a combination of Light Gradient Boosting Machine and Category Boosting Classifier, with training and test results of 0.70665 and 0.64694. Important features have also been identified as a reference for taking policies to protect user devices from malware infections.

Downloads

Download data is not yet available.

References

D. P. F. Möller, Cybersecurity in Digital Transformation: Scope and Applications. Cham: Springer International Publishing, 2020. doi: 10.1007/978-3-030-60570-4.

I. Pihir, K. Pupek, and M. Furjan, ‘Digital Transformation Insights and Trends’, in Proceedings of the Central European Conference on Information and Intelligent Systems, Croatia, Sep. 2018, pp. 141–149.

E. Indriasari, S. Supangkat, and R. Kosala, ‘Digital Transformation: IT Governance In The Agile Environment A Study Case Of Indonesia High Regulated Company’, International Journal of Scientific and Technology Research, vol. 9, no. 4, pp. 1557–1562, Apr. 2020.

M. Stamp, M. Alazab, and A. Shalaginov, Eds., Malware Analysis Using Artificial Intelligence and Deep Learning. Cham: Springer International Publishing, 2021. doi: 10.1007/978-3-030-62582-5.

K. Thakur and A.-S. K. Pathan, Cybersecurity fundamentals: a real-world perspective, First edition. Boca Raton, FL: CRC Press, 2020.

R. Das, Practical AI for cybersecurity. Florida: Auerbach Publications, 2021.

T. Rains, Cybersecurity Threats, Malware Trends, and Strategies. S.l.: Packt Publishing, 2020.

N. R. Pokhrel, H. Rodrigo, and C. P. Tsokos, ‘Cybersecurity: Time Series Predictive Modeling of Vulnerabilities of Desktop Operating System Using Linear and Non-Linear Approach’, JIS, vol. 08, no. 04, pp. 362–382, 2017, doi: 10.4236/jis.2017.84023.

D. Gibert, C. Mateu, and J. Planes, ‘The rise of machine learning for detection and classification of malware: Research developments, trends and challenges’, Journal of Network and Computer Applications, vol. 153, p. 102526, Mar. 2020, doi: 10.1016/j.jnca.2019.102526.

R. McCann et al., ‘Microsoft Malware Prediction’, Dec. 14, 2018. https://kaggle.com/c/microsoft-malware-prediction (accessed Nov. 13, 2021).

S. W. Knox, Machine learning: a concise introduction. Hoboken, New Jersey: John Wiley & Sons, 2018.

T. Amr, Hands-on machine learning with scikit-learn and scientific Python toolkits: a practical guide to implementing supervised and unsupervised machine learning algorithms in Python. Birmingham Mumbai: Packt, 2020.

R. Kumar, Machine Learning Quick Reference. Birmingham, UK: Packt Publishing, 2019.

M. Shahihi, R. Farhanian, and M. Ellis, ‘Machine Learning to Predict the Likelihood of a Personal Computer to Be Infected with Malware’, SMU Data Science Review, vol. 2, no. 2, p. 9, 2019.

Q. Pan, W. Tang, and S. Yao, ‘The Application of LightGBM in Microsoft Malware Detection’, J. Phys.: Conf. Ser., vol. 1684, p. 012041, Nov. 2020, doi: 10.1088/1742-6596/1684/1/012041.

M. Sokolov and N. Herndon, ‘Predicting Malware Attacks using Machine Learning and AutoAI’:, in Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods, 2021, pp. 295–301. doi: 10.5220/0010264902950301.