Performance Analysis of Hybrid Machine Learning Methods on Imbalanced Data (Rainfall Classification)

  • Aditya Gumilar Telkom University
  • Sri Suryani Prasetiyowati Telkom University
  • Yuliant Sibaroni Telkom University
Keywords: Rainfall, Machine Learning, Hybrid Methods, Classification, SMOTE

Abstract

This study proposes several methods to analyze the performance of the hybrid machine learning method using Voting and Stacking on rainfall classification. The two hybrid methods will combine five classification methods, namely Logistic Regression, Support Vector Machine, Random Forest, Artificial Neural Network, and eXtreme Gradient Boosting. The data used is Bandung City rainfall data for the years 2005 until 2021. The hybrid method is classified as an ensemble, which means combining several individual classification models to improve the performance of the built model. Voting algorithm has weaknesses in imbalanced data, while stacking does not. The results show that by combining five machine learning methods on an imbalanced dataset, the Stacking algorithm obtains an accuracy value of 99.60%. Meanwhile, with the addition of the SMOTE technique, the accuracy increases to 99.71%. This is supported by the performance of the Stacking method which is superior because it takes the best classification value for each individual model and can overcome the imbalance. Model evaluation does not only focus on accuracy, but also precision, recall, and f1-score. The contribution of this research is to provide information about the best Hybrid method between Voting and Stacking in obtaining model performance results on rainfall classification.

Downloads

Download data is not yet available.

References

BMKG, “Probabilistik Curah Hujan 20 mm (tiap 24 jam),” Badan Meteorologi Klimatologi dan Geofisika, 2022. https://www.bmkg.go.id/cuaca/probabilistik-curah-hujan.bmkg (accessed Jun. 13, 2022).

S. Ashilah, “Data Curah Hujan di Kota Bandung 1998-2020, Tahun 2010 Paling Basah,” BandungBergerak.id, 2021. https://bandungbergerak.id/article/detail/1351/data-curah-hujan-di-kota-bandung-1998-2020-tahun-2010-paling-basah (accessed Jun. 02, 2022).

S. Ashilah, “Data Intensitas Banjir di Kota Bandung 2003-2020, Meninggi Seiring Fenomena Penurunan Muka Tanah,” BandungBergerak.id, 2022. https://bandungbergerak.id/article/detail/1606/data-intensitas-banjir-di-kota-bandung-2003-2020-meninggi-seiring-fenomena-penurunan-muka-tanah (accessed May 15, 2022).

E. Kurniawan, F. Nhita, A. Aditsania, and D. Saepudin, “C5.0 algorithm and synthetic minority oversampling technique (SMOTE) for rainfall forecasting in bandung regency,” 2019 7th Int. Conf. Inf. Commun. Technol. ICoICT 2019, vol. 4, pp. 1–5, 2019, doi: 10.1109/ICoICT.2019.8835324.

N. A. Sulaiman, “Improving Support Vector Machine Rainfall Classification Accuracy based on Kernel Parameters Optimization for Statistical Downscaling Approach,” Int. J. Adv. Trends Comput. Sci. Eng., vol. 9, no. 1.4, pp. 652–657, 2020, doi: 10.30534/ijatcse/2020/9191.42020.

A. Primajaya and B. N. Sari, “Random Forest Algorithm for Prediction of Precipitation,” Indones. J. Artif. Intell. Data Min., vol. 1, no. 1, p. 27, 2018, doi: 10.24014/ijaidm.v1i1.4903.

N. Oswal, “Predicting Rainfall using Machine Learning Techniques,” 2019, doi: 10.36227/techrxiv.14398304.

I. Muslim and K. Karo, “Implementasi Metode XGBoost dan Feature Importance untuk Klasifikasi pada Kebakaran Hutan dan Lahan,” J. Softw. Eng. Inf. Commun. Technol., vol. 1, no. 1, pp. 10–16, 2020.

F. Paquin, J. Rivnay, A. Salleo, N. Stingelin, and C. Silva, “Multi-phase semicrystalline microstructures drive exciton dissociation in neat plastic semiconductors,” J. Mater. Chem. C, vol. 3, pp. 10715–10722, 2015, doi: 10.1039/b000000x.

S. Shofura, S. Suryani M.Si, L. Salma, and S. Harini, “The Effect of Number of Factors and Data on Monthly Weather Classification Performance Using Artificial Neural Networks,” Int. J. Inf. Commun. Technol., vol. 7, no. 2, pp. 23–35, 2021, doi: 10.21108/ijoict.v7i2.602.

M. A. Rahman and R. C. Muniyandi, “Feature selection from colon cancer dataset for cancer classification using Artificial Neural Network,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 8, no. 4–2, pp. 1387–1393, 2018, doi: 10.18517/ijaseit.8.4-2.6790.

D. D. Sidik and T. W. Sen, “Penggunaan Stacking Classifier Untuk Prediksi Curah Hujan,” IT Soc., vol. 4, no. 1, pp. 21–27, 2019, doi: 10.33021/itfs.v4i1.1180.

S. K. Kalagotla, S. V. Gangashetty, and K. Giridhar, “A novel stacking technique for prediction of diabetes,” Comput. Biol. Med., vol. 135, no. June, p. 104554, 2021, doi: 10.1016/j.compbiomed.2021.104554.

S. Gupta and M. K. Gupta, “Computational Prediction of Cervical Cancer Diagnosis Using Ensemble-Based Classification Algorithm,” Comput. J., vol. 00, no. 00, 2021, doi: 10.1093/comjnl/bxaa198.

S. Hou, Y. Liu, and Q. Yang, “Real-time prediction of rock mass classification based on TBM operation big data and stacking technique of ensemble learning,” J. Rock Mech. Geotech. Eng., vol. 14, no. 1, pp. 123–143, 2022, doi: 10.1016/j.jrmge.2021.05.004.

A. Nurmasani and Y. Pristyanto, “Algoritme Stacking Untuk Klasifikasi Penyakit Jantung Pada Dataset Imbalanced Class,” Pseudocode, vol. 8, no. 1, pp. 21–26, 2021, doi: 10.33369/pseudocode.8.1.21-26.

I. M. K. Karo, R. Ramdhani, A. W. Ramadhelza, and B. Z. Aufa, “A Hybrid Classification Based on Machine Learning Classifiers to Predict Smart Indonesia Program,” Proceeding - 2020 3rd Int. Conf. Vocat. Educ. Electr. Eng. Strength. Framew. Soc. 5.0 through Innov. Educ. Electr. Eng. Informatics Eng. ICVEE 2020, 2020, doi: 10.1109/ICVEE50212.2020.9243195.

F. N. Khasanah and F. Nhita, “Weather Forecasting in Bandung Regency based on FP-Growth Algorithm,” Int. J. Inf. Commun. Technol., vol. 4, no. 2, p. 1, 2019, doi: 10.21108/ijoict.2018.42.203.

R. Ahuja and S. C. Sharma, “Stacking and voting ensemble methods fusion to evaluate instructor performance in higher education,” Int. J. Inf. Technol., vol. 13, no. 5, pp. 1721–1731, 2021, doi: 10.1007/s41870-021-00729-4.

A. Salam, Sri Suryani Prasetiyowati, and Yuliant Sibaroni, “Prediction Vulnerability Level of Dengue Fever Using KNN and Random Forest,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 3, pp. 531–536, 2020, doi: 10.29207/resti.v4i3.1926.

H. Sain, H. Kuswanto, S. W. Purnami, and S. P. Rahayu, “Classification of rainfall data using support vector machine,” J. Phys. Conf. Ser., vol. 1763, no. 1, 2021, doi: 10.1088/1742-6596/1763/1/012048.

M. Y. H. Setyawan, R. M. Awangga, and S. R. Efendi, “Comparison Of Multinomial Naive Bayes Algorithm And Logistic Regression For Intent Classification In Chatbot,” Proc. 2018 Int. Conf. Appl. Eng. ICAE 2018, pp. 1–5, 2018, doi: 10.1109/INCAE.2018.8579372.

R. Gupta, N. Koli, N. Mahor, and N. Tejashri, “Performance analysis of machine learning classifier for predicting chronic kidney disease,” 2020 Int. Conf. Emerg. Technol. INCET 2020, pp. 1–4, 2020, doi: 10.1109/INCET49848.2020.9154147.

S. H. Moon and Y. H. Kim, “An improved forecast of precipitation type using correlation-based feature selection and multinomial logistic regression,” Atmos. Res., vol. 240, no. February, p. 104928, 2020, doi: 10.1016/j.atmosres.2020.104928.

M. Pal, “Random forest classifier for remote sensing classification,” Int. J. Remote Sens., vol. 26, no. 1, pp. 217–222, 2005, doi: 10.1080/01431160412331269698.

N. Khalili, S. R. Khodashenas, K. Davary, M. M. Baygi, and F. Karimaldini, “Prediction of rainfall using artificial neural networks for synoptic station of Mashhad: a case study,” Arab. J. Geosci., vol. 9, no. 13, 2016, doi: 10.1007/s12517-016-2633-1.

A. Alamsyah and M. F. Permana, “Artificial Neural Network for Predicting Indonesian Economic Growth Using Macroeconomics Indicators,” Proceeding - 2018 Int. Symp. Adv. Intell. Informatics Revolutionize Intell. Informatics Spectr. Humanit. SAIN 2018, pp. 15–19, 2019, doi: 10.1109/SAIN.2018.8673347.

Darussalam and G. Arief, “Jurnal Resti,” Resti, vol. 1, no. 1, pp. 19–25, 2017.

D. N. Purba, Fhira Nhita, and Isman Kurniawan, “Implementation of Ensemble Method in Schizophrenia Identification Based on Microarray Data,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 6, no. 1, pp. 64–69, 2022, doi: 10.29207/resti.v6i1.3788.

I. L. Cherif and A. Kortebi, “On using eXtreme Gradient Boosting (XGBoost) Machine Learning algorithm for Home Network Traffic Classification,” IFIP Wirel. Days, vol. 2019-April, pp. 1–6, 2019, doi: 10.1109/WD.2019.8734193.

S. Misra, H. Li, and J. He, Machine Learning for Subsurface Characterization. Elsevier Science, 2019.

Published
2022-07-15
How to Cite
Aditya Gumilar, Sri Suryani Prasetiyowati, & Yuliant Sibaroni. (2022). Performance Analysis of Hybrid Machine Learning Methods on Imbalanced Data (Rainfall Classification). Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(3), 481 - 490. https://doi.org/10.29207/resti.v6i3.4142
Section
Artikel Teknologi Informasi

Most read articles by the same author(s)