Rotation Double Random Forest Algorithm to Predict The Food Insecurity Status of Households
Abstract
The ensemble tree method has been proven to handle classification problems well. The strength of the ensemble tree technique lies in the diversity and independence between each tree. Increasing the diversity of mutually independent decision trees improves the performance of the model. Various studies propose the development of ensemble tree-based models by forming algorithms that create decision trees that are formed independently of each other and have various inputs. These include random forest (RF), rotation forest (RoF), double random forest (DRF), and the latest is rotation double random forest (RoDRF). RoDRF rotates or transforms data with the intent of producing better diversity among the learning base. RoDRF applies the concept of variable rotation to trees based on the DRF algorithm. Random rotations or transformations on different feature subspaces produce different projections, leading to better generalization or prediction performance. This research aims to compare the performance of RoDRF with the RF, RoF, and DRF models on unbalanced data in cases of food insecurity. Class imbalance will be handled with two methods, namely EasyEnsemble and SMOTE-NC. The research results show that the DRF's model with EasyEnsemble techniques produces a model with the best performance among several algorithms tested. Although the resulting precision is 0.62274 and the AUC value is 0.68501, the model can predict each class equally. All algorithms with EasyEnsemble treatment have average AUC values significantly different from each other based on statistical test results. This research also used SHAP to explain variables that significantly contribute to the household's food insecurity status model.
Downloads
References
Priyanka and D. Kumar, “Decision tree classifier: A detailed survey,” International Journal of Information and Decision Sciences, vol. 12, no. 3, pp. 246–269, 2020, doi: 10.1504/ijids.2020.108141.
J. Fitzgerald, “Bias and Variance Reduction Strategies for Improving Generalisation Performance of Genetic by A thesis for the PhD Degree,” University of Limerick, 2014. [Online]. Available: http://www0.cs.ucl.ac.uk/staff/W.Langdon/ftp/papers/jmfitz-thesis.pdf
H. Du, Y. Zhang, L. Zhang, and Y. C. Chen, “Selective Ensemble Learning Algorithm for Imbalanced Dataset,” Computer Science and Information Systems, vol. 20, no. 2, pp. 831–856, 2023, doi: 10.2298/CSIS220817023D.
N. Thomas Rincy and R. Gupta, “Ensemble learning techniques and its efficiency in machine learning: A survey,” 2nd International Conference on Data, Engineering and Applications, IDEA 2020, 2020, doi: 10.1109/IDEA49133.2020.9170675.
L. Breiman, “Random Forests,” Mach Learn, vol. 45, pp. 5–32, 2001, doi: https://doi.org/10.1023/A:1010933404324.
J. L. Speiser, M. E. Miller, J. Tooze, and E. Ip, “A comparison of random forest variable selection methods for classification prediction modelling,” Expert Syst Appl, vol. 134, pp. 93–101, 2019, doi 10.1016/j.eswa.2019.05.028.
J. J. Rodríguez, L. I. Kuncheva, and C. J. Alonso, “Rotation forest: A New classifier ensemble method,” IEEE Trans Pattern Anal Mach Intell, vol. 28, no. 10, pp. 1619–1630, 2006, doi: 10.1109/TPAMI.2006.211.
S. Han, H. Kim, and Y. S. Lee, “Double random forest,” Mach Learn, vol. 109, no. 8, pp. 1569–1586, Aug. 2020, doi: 10.1007/s10994-020-05889-1.
M. A. Ganaie, M. Tanveer, P. N. Suganthan, and V. Snasel, “Oblique and rotation double random forest,” Neural Networks, vol. 153, pp. 496–517, Nov. 2022, doi: 10.1016/j.neunet.2022.06.012.
S. Abdullah and G. Prasetyo, “Easy Ensemble With Random Forest To Handle Imbalanced Data in Classification,” Journal of Fundamental Mathematics and Applications (JFMA), vol. 3, no. 1, pp. 39–46, 2020, doi: 10.14710/jfma.v3i1.7415.
J. Wijaya, A. M. Soleh, and A. Rizki, “Penanganan Data Tidak Seimbang pada Pemodelan Rotation Forest Keberhasilan Studi Mahasiswa Program Magister IPB,” Xplore: Journal of Statistics, vol. 2, no. 2, pp. 32–40, 2018, doi: 10.29244/xplore.v2i2.99.
T. Wongvorachan, S. He, and O. Bulut, “A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining,” Information (Switzerland), vol. 14, no. 1, 2023, doi: 10.3390/info14010054.
T. A. E Ramadhani, B Sartono, A F Hadi, S ‘Ufa, “Comparison of Main Characteristics of Food Insecurity Using Classification Tree and Random,” Sinkron : Jurnal dan Penelitian Teknik Informatika, vol. 7, no. 4, pp. 2486–2497, 2022, doi: https://doi.org/10.33395/sinkron.v7i4.11852.
H. Dharmawan, B. Sartono, A. Kurnia, A. F. Hadi, and E. Ramadhani, “A Study of Machine Learning Algorithms To Measure the Feature Importance in Class-Imbalance Data of Food Insecurity Cases in Indonesia,” Communications in Mathematical Biology and Neuroscience, vol. 2022, pp. 1–25, 2022, doi: 10.28919/cmbn/7636.
A. Bagnall, M. Flynn, J. Large, J. Line, A. Bostrom, and G. Cawley, “Is rotation forest the best classifier for problems with continuous features?,” Sep. 2018, [Online]. Available: http://arxiv.org/abs/1809.06705
B. Sartono, M. Raharjo, and C. Suhaeni, “Empirical Study on the Predictive Power of Rotation Forest,” in IOP Conference Series: Earth and Environmental Science, Institute of Physics Publishing, Nov. 2018. doi: 10.1088/1755-1315/187/1/012053.
E. C. Gök and M. O. Olgun, “SMOTE-NC and gradient boosting imputation based random forest classifier for predicting severity level of covid-19 patients with blood samples,” Neural Comput Appl, vol. 33, no. 22, pp. 15693–15707, 2021, doi: 10.1007/s00521-021-06189-y.
N. P. Y. T. Wijayanti, E. N. Kencana, and I. W. Sumarjaya, “SMOTE: Potensi dan Kekurangannya Pada Survei,” E-Jurnal Matematika, vol. 10, no. 4, p. 235, Nov. 2021, doi: 10.24843/mtk.2021.v10.i04.p348.
A. J. Bowers and X. Zhou, “Receiver Operating Characteristic (ROC) Area Under the Curve (AUC): A Diagnostic Measure for Evaluating the Accuracy of Predictors of Education Outcomes,” J Educ Stud Placed Risk, vol. 24, no. 1, pp. 20–46, 2019, doi: 10.1080/10824669.2018.1523734.
C. Halimu, A. Kasem, and S. H. S. Newaz, “Empirical comparison of area under ROC curve (AUC) and Mathew correlation coefficient (MCC) for evaluating machine learning algorithms on imbalanced datasets for binary classification,” ACM International Conference Proceeding Series, no. Mcc, pp. 1–6, 2019, doi: 10.1145/3310986.3311023.
H. Chen, I. C. Covert, S. M. Lundberg, and S. Lee, “Algorithms to estimate Shapley value feature attributions,” arXiv:2207.07605v1 [cs.LG], no. Section 3, 2022, doi: https://doi.org/10.48550/arXiv.2207.07605.
A. Saint Ville, J. Y. T. Po, A. Sen, A. Bui, and H. Melgar-Quiñonez, “Food security and the Food Insecurity Experience Scale (FIES): ensuring progress by 2030,” Food Security, vol. 11, no. 3, pp. 483–491, 2019, doi: 10.1007/s12571-019-00936-9.
Copyright (c) 2024 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;