Rotation Double Random Forest Algorithm to Predict The Food Insecurity Status of Households

Rais; Agus Mohamad Soleh; Budi Susetyo

doi:10.29207/resti.v8i1.5540

Rais IPB University
Agus Mohamad Soleh IPB University
Budi Susetyo IPB University

DOI: https://doi.org/10.29207/resti.v8i1.5540

Keywords: rotation double random forest, easyensemble, SMOTE-NC, food insecurity

Abstract

The ensemble tree method has been proven to handle classification problems well. The strength of the ensemble tree technique lies in the diversity and independence between each tree. Increasing the diversity of mutually independent decision trees improves the performance of the model. Various studies propose the development of ensemble tree-based models by forming algorithms that create decision trees that are formed independently of each other and have various inputs. These include random forest (RF), rotation forest (RoF), double random forest (DRF), and the latest is rotation double random forest (RoDRF). RoDRF rotates or transforms data with the intent of producing better diversity among the learning base. RoDRF applies the concept of variable rotation to trees based on the DRF algorithm. Random rotations or transformations on different feature subspaces produce different projections, leading to better generalization or prediction performance. This research aims to compare the performance of RoDRF with the RF, RoF, and DRF models on unbalanced data in cases of food insecurity. Class imbalance will be handled with two methods, namely EasyEnsemble and SMOTE-NC. The research results show that the DRF's model with EasyEnsemble techniques produces a model with the best performance among several algorithms tested. Although the resulting precision is 0.62274 and the AUC value is 0.68501, the model can predict each class equally. All algorithms with EasyEnsemble treatment have average AUC values significantly different from each other based on statistical test results. This research also used SHAP to explain variables that significantly contribute to the household's food insecurity status model.

Downloads

Download data is not yet available.

References

Priyanka and D. Kumar, “Decision tree classifier: A detailed survey,” International Journal of Information and Decision Sciences, vol. 12, no. 3, pp. 246–269, 2020, doi: 10.1504/ijids.2020.108141.

J. Fitzgerald, “Bias and Variance Reduction Strategies for Improving Generalisation Performance of Genetic by A thesis for the PhD Degree,” University of Limerick, 2014. [Online]. Available: http://www0.cs.ucl.ac.uk/staff/W.Langdon/ftp/papers/jmfitz-thesis.pdf

H. Du, Y. Zhang, L. Zhang, and Y. C. Chen, “Selective Ensemble Learning Algorithm for Imbalanced Dataset,” Computer Science and Information Systems, vol. 20, no. 2, pp. 831–856, 2023, doi: 10.2298/CSIS220817023D.

N. Thomas Rincy and R. Gupta, “Ensemble learning techniques and its efficiency in machine learning: A survey,” 2nd International Conference on Data, Engineering and Applications, IDEA 2020, 2020, doi: 10.1109/IDEA49133.2020.9170675.

L. Breiman, “Random Forests,” Mach Learn, vol. 45, pp. 5–32, 2001, doi: https://doi.org/10.1023/A:1010933404324.

J. L. Speiser, M. E. Miller, J. Tooze, and E. Ip, “A comparison of random forest variable selection methods for classification prediction modelling,” Expert Syst Appl, vol. 134, pp. 93–101, 2019, doi 10.1016/j.eswa.2019.05.028.

J. J. Rodríguez, L. I. Kuncheva, and C. J. Alonso, “Rotation forest: A New classifier ensemble method,” IEEE Trans Pattern Anal Mach Intell, vol. 28, no. 10, pp. 1619–1630, 2006, doi: 10.1109/TPAMI.2006.211.

S. Han, H. Kim, and Y. S. Lee, “Double random forest,” Mach Learn, vol. 109, no. 8, pp. 1569–1586, Aug. 2020, doi: 10.1007/s10994-020-05889-1.

M. A. Ganaie, M. Tanveer, P. N. Suganthan, and V. Snasel, “Oblique and rotation double random forest,” Neural Networks, vol. 153, pp. 496–517, Nov. 2022, doi: 10.1016/j.neunet.2022.06.012.

S. Abdullah and G. Prasetyo, “Easy Ensemble With Random Forest To Handle Imbalanced Data in Classification,” Journal of Fundamental Mathematics and Applications (JFMA), vol. 3, no. 1, pp. 39–46, 2020, doi: 10.14710/jfma.v3i1.7415.

J. Wijaya, A. M. Soleh, and A. Rizki, “Penanganan Data Tidak Seimbang pada Pemodelan Rotation Forest Keberhasilan Studi Mahasiswa Program Magister IPB,” Xplore: Journal of Statistics, vol. 2, no. 2, pp. 32–40, 2018, doi: 10.29244/xplore.v2i2.99.

T. Wongvorachan, S. He, and O. Bulut, “A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining,” Information (Switzerland), vol. 14, no. 1, 2023, doi: 10.3390/info14010054.

T. A. E Ramadhani, B Sartono, A F Hadi, S ‘Ufa, “Comparison of Main Characteristics of Food Insecurity Using Classification Tree and Random,” Sinkron : Jurnal dan Penelitian Teknik Informatika, vol. 7, no. 4, pp. 2486–2497, 2022, doi: https://doi.org/10.33395/sinkron.v7i4.11852.

H. Dharmawan, B. Sartono, A. Kurnia, A. F. Hadi, and E. Ramadhani, “A Study of Machine Learning Algorithms To Measure the Feature Importance in Class-Imbalance Data of Food Insecurity Cases in Indonesia,” Communications in Mathematical Biology and Neuroscience, vol. 2022, pp. 1–25, 2022, doi: 10.28919/cmbn/7636.

A. Bagnall, M. Flynn, J. Large, J. Line, A. Bostrom, and G. Cawley, “Is rotation forest the best classifier for problems with continuous features?,” Sep. 2018, [Online]. Available: http://arxiv.org/abs/1809.06705

B. Sartono, M. Raharjo, and C. Suhaeni, “Empirical Study on the Predictive Power of Rotation Forest,” in IOP Conference Series: Earth and Environmental Science, Institute of Physics Publishing, Nov. 2018. doi: 10.1088/1755-1315/187/1/012053.

E. C. Gök and M. O. Olgun, “SMOTE-NC and gradient boosting imputation based random forest classifier for predicting severity level of covid-19 patients with blood samples,” Neural Comput Appl, vol. 33, no. 22, pp. 15693–15707, 2021, doi: 10.1007/s00521-021-06189-y.

N. P. Y. T. Wijayanti, E. N. Kencana, and I. W. Sumarjaya, “SMOTE: Potensi dan Kekurangannya Pada Survei,” E-Jurnal Matematika, vol. 10, no. 4, p. 235, Nov. 2021, doi: 10.24843/mtk.2021.v10.i04.p348.

A. J. Bowers and X. Zhou, “Receiver Operating Characteristic (ROC) Area Under the Curve (AUC): A Diagnostic Measure for Evaluating the Accuracy of Predictors of Education Outcomes,” J Educ Stud Placed Risk, vol. 24, no. 1, pp. 20–46, 2019, doi: 10.1080/10824669.2018.1523734.

C. Halimu, A. Kasem, and S. H. S. Newaz, “Empirical comparison of area under ROC curve (AUC) and Mathew correlation coefficient (MCC) for evaluating machine learning algorithms on imbalanced datasets for binary classification,” ACM International Conference Proceeding Series, no. Mcc, pp. 1–6, 2019, doi: 10.1145/3310986.3311023.

H. Chen, I. C. Covert, S. M. Lundberg, and S. Lee, “Algorithms to estimate Shapley value feature attributions,” arXiv:2207.07605v1 [cs.LG], no. Section 3, 2022, doi: https://doi.org/10.48550/arXiv.2207.07605.

A. Saint Ville, J. Y. T. Po, A. Sen, A. Bui, and H. Melgar-Quiñonez, “Food security and the Food Insecurity Experience Scale (FIES): ensuring progress by 2030,” Food Security, vol. 11, no. 3, pp. 483–491, 2019, doi: 10.1007/s12571-019-00936-9.