Performance Analysis of Hybrid Machine Learning Methods on Imbalanced Data (Rainfall Classification)
Abstract
This study proposes several methods to analyze the performance of the hybrid machine learning method using Voting and Stacking on rainfall classification. The two hybrid methods will combine five classification methods, namely Logistic Regression, Support Vector Machine, Random Forest, Artificial Neural Network, and eXtreme Gradient Boosting. The data used is Bandung City rainfall data for the years 2005 until 2021. The hybrid method is classified as an ensemble, which means combining several individual classification models to improve the performance of the built model. Voting algorithm has weaknesses in imbalanced data, while stacking does not. The results show that by combining five machine learning methods on an imbalanced dataset, the Stacking algorithm obtains an accuracy value of 99.60%. Meanwhile, with the addition of the SMOTE technique, the accuracy increases to 99.71%. This is supported by the performance of the Stacking method which is superior because it takes the best classification value for each individual model and can overcome the imbalance. Model evaluation does not only focus on accuracy, but also precision, recall, and f1-score. The contribution of this research is to provide information about the best Hybrid method between Voting and Stacking in obtaining model performance results on rainfall classification.
Downloads
References
BMKG, “Probabilistik Curah Hujan 20 mm (tiap 24 jam),” Badan Meteorologi Klimatologi dan Geofisika, 2022. https://www.bmkg.go.id/cuaca/probabilistik-curah-hujan.bmkg (accessed Jun. 13, 2022).
S. Ashilah, “Data Curah Hujan di Kota Bandung 1998-2020, Tahun 2010 Paling Basah,” BandungBergerak.id, 2021. https://bandungbergerak.id/article/detail/1351/data-curah-hujan-di-kota-bandung-1998-2020-tahun-2010-paling-basah (accessed Jun. 02, 2022).
S. Ashilah, “Data Intensitas Banjir di Kota Bandung 2003-2020, Meninggi Seiring Fenomena Penurunan Muka Tanah,” BandungBergerak.id, 2022. https://bandungbergerak.id/article/detail/1606/data-intensitas-banjir-di-kota-bandung-2003-2020-meninggi-seiring-fenomena-penurunan-muka-tanah (accessed May 15, 2022).
E. Kurniawan, F. Nhita, A. Aditsania, and D. Saepudin, “C5.0 algorithm and synthetic minority oversampling technique (SMOTE) for rainfall forecasting in bandung regency,” 2019 7th Int. Conf. Inf. Commun. Technol. ICoICT 2019, vol. 4, pp. 1–5, 2019, doi: 10.1109/ICoICT.2019.8835324.
N. A. Sulaiman, “Improving Support Vector Machine Rainfall Classification Accuracy based on Kernel Parameters Optimization for Statistical Downscaling Approach,” Int. J. Adv. Trends Comput. Sci. Eng., vol. 9, no. 1.4, pp. 652–657, 2020, doi: 10.30534/ijatcse/2020/9191.42020.
A. Primajaya and B. N. Sari, “Random Forest Algorithm for Prediction of Precipitation,” Indones. J. Artif. Intell. Data Min., vol. 1, no. 1, p. 27, 2018, doi: 10.24014/ijaidm.v1i1.4903.
N. Oswal, “Predicting Rainfall using Machine Learning Techniques,” 2019, doi: 10.36227/techrxiv.14398304.
I. Muslim and K. Karo, “Implementasi Metode XGBoost dan Feature Importance untuk Klasifikasi pada Kebakaran Hutan dan Lahan,” J. Softw. Eng. Inf. Commun. Technol., vol. 1, no. 1, pp. 10–16, 2020.
F. Paquin, J. Rivnay, A. Salleo, N. Stingelin, and C. Silva, “Multi-phase semicrystalline microstructures drive exciton dissociation in neat plastic semiconductors,” J. Mater. Chem. C, vol. 3, pp. 10715–10722, 2015, doi: 10.1039/b000000x.
S. Shofura, S. Suryani M.Si, L. Salma, and S. Harini, “The Effect of Number of Factors and Data on Monthly Weather Classification Performance Using Artificial Neural Networks,” Int. J. Inf. Commun. Technol., vol. 7, no. 2, pp. 23–35, 2021, doi: 10.21108/ijoict.v7i2.602.
M. A. Rahman and R. C. Muniyandi, “Feature selection from colon cancer dataset for cancer classification using Artificial Neural Network,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 8, no. 4–2, pp. 1387–1393, 2018, doi: 10.18517/ijaseit.8.4-2.6790.
D. D. Sidik and T. W. Sen, “Penggunaan Stacking Classifier Untuk Prediksi Curah Hujan,” IT Soc., vol. 4, no. 1, pp. 21–27, 2019, doi: 10.33021/itfs.v4i1.1180.
S. K. Kalagotla, S. V. Gangashetty, and K. Giridhar, “A novel stacking technique for prediction of diabetes,” Comput. Biol. Med., vol. 135, no. June, p. 104554, 2021, doi: 10.1016/j.compbiomed.2021.104554.
S. Gupta and M. K. Gupta, “Computational Prediction of Cervical Cancer Diagnosis Using Ensemble-Based Classification Algorithm,” Comput. J., vol. 00, no. 00, 2021, doi: 10.1093/comjnl/bxaa198.
S. Hou, Y. Liu, and Q. Yang, “Real-time prediction of rock mass classification based on TBM operation big data and stacking technique of ensemble learning,” J. Rock Mech. Geotech. Eng., vol. 14, no. 1, pp. 123–143, 2022, doi: 10.1016/j.jrmge.2021.05.004.
A. Nurmasani and Y. Pristyanto, “Algoritme Stacking Untuk Klasifikasi Penyakit Jantung Pada Dataset Imbalanced Class,” Pseudocode, vol. 8, no. 1, pp. 21–26, 2021, doi: 10.33369/pseudocode.8.1.21-26.
I. M. K. Karo, R. Ramdhani, A. W. Ramadhelza, and B. Z. Aufa, “A Hybrid Classification Based on Machine Learning Classifiers to Predict Smart Indonesia Program,” Proceeding - 2020 3rd Int. Conf. Vocat. Educ. Electr. Eng. Strength. Framew. Soc. 5.0 through Innov. Educ. Electr. Eng. Informatics Eng. ICVEE 2020, 2020, doi: 10.1109/ICVEE50212.2020.9243195.
F. N. Khasanah and F. Nhita, “Weather Forecasting in Bandung Regency based on FP-Growth Algorithm,” Int. J. Inf. Commun. Technol., vol. 4, no. 2, p. 1, 2019, doi: 10.21108/ijoict.2018.42.203.
R. Ahuja and S. C. Sharma, “Stacking and voting ensemble methods fusion to evaluate instructor performance in higher education,” Int. J. Inf. Technol., vol. 13, no. 5, pp. 1721–1731, 2021, doi: 10.1007/s41870-021-00729-4.
A. Salam, Sri Suryani Prasetiyowati, and Yuliant Sibaroni, “Prediction Vulnerability Level of Dengue Fever Using KNN and Random Forest,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 3, pp. 531–536, 2020, doi: 10.29207/resti.v4i3.1926.
H. Sain, H. Kuswanto, S. W. Purnami, and S. P. Rahayu, “Classification of rainfall data using support vector machine,” J. Phys. Conf. Ser., vol. 1763, no. 1, 2021, doi: 10.1088/1742-6596/1763/1/012048.
M. Y. H. Setyawan, R. M. Awangga, and S. R. Efendi, “Comparison Of Multinomial Naive Bayes Algorithm And Logistic Regression For Intent Classification In Chatbot,” Proc. 2018 Int. Conf. Appl. Eng. ICAE 2018, pp. 1–5, 2018, doi: 10.1109/INCAE.2018.8579372.
R. Gupta, N. Koli, N. Mahor, and N. Tejashri, “Performance analysis of machine learning classifier for predicting chronic kidney disease,” 2020 Int. Conf. Emerg. Technol. INCET 2020, pp. 1–4, 2020, doi: 10.1109/INCET49848.2020.9154147.
S. H. Moon and Y. H. Kim, “An improved forecast of precipitation type using correlation-based feature selection and multinomial logistic regression,” Atmos. Res., vol. 240, no. February, p. 104928, 2020, doi: 10.1016/j.atmosres.2020.104928.
M. Pal, “Random forest classifier for remote sensing classification,” Int. J. Remote Sens., vol. 26, no. 1, pp. 217–222, 2005, doi: 10.1080/01431160412331269698.
N. Khalili, S. R. Khodashenas, K. Davary, M. M. Baygi, and F. Karimaldini, “Prediction of rainfall using artificial neural networks for synoptic station of Mashhad: a case study,” Arab. J. Geosci., vol. 9, no. 13, 2016, doi: 10.1007/s12517-016-2633-1.
A. Alamsyah and M. F. Permana, “Artificial Neural Network for Predicting Indonesian Economic Growth Using Macroeconomics Indicators,” Proceeding - 2018 Int. Symp. Adv. Intell. Informatics Revolutionize Intell. Informatics Spectr. Humanit. SAIN 2018, pp. 15–19, 2019, doi: 10.1109/SAIN.2018.8673347.
Darussalam and G. Arief, “Jurnal Resti,” Resti, vol. 1, no. 1, pp. 19–25, 2017.
D. N. Purba, Fhira Nhita, and Isman Kurniawan, “Implementation of Ensemble Method in Schizophrenia Identification Based on Microarray Data,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 6, no. 1, pp. 64–69, 2022, doi: 10.29207/resti.v6i1.3788.
I. L. Cherif and A. Kortebi, “On using eXtreme Gradient Boosting (XGBoost) Machine Learning algorithm for Home Network Traffic Classification,” IFIP Wirel. Days, vol. 2019-April, pp. 1–6, 2019, doi: 10.1109/WD.2019.8734193.
S. Misra, H. Li, and J. He, Machine Learning for Subsurface Characterization. Elsevier Science, 2019.
Copyright (c) 2022 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;