Applying Different Resampling Strategies In Random Forest Algorithm To Predict Lumpy Skin Disease

Suparyati Suparyati; Emma Utami; Alva Hendi Muhammad

doi:10.29207/resti.v6i4.4147

Suparyati Suparyati Universitas Amikom Yogyakarta
Emma Utami Universitas Amikom Yogyakarta
Alva Hendi Muhammad Universitas Amikom Yogyakarta

DOI: https://doi.org/10.29207/resti.v6i4.4147

Keywords: Genetic Algorithm, Hyperparameter Tuning, Lumpy Skin Disease, Machine Learning, Random Forest

Abstract

The spread of Lumpy Skin Disease (LSD) that infects livestock is increasingly widespread in various parts of the world. Early detection of the disease’s spread is necessary so that the economic losses caused by LSD are not higher. The use of machine learning algorithms to predict the presence of a disease has been carried out, including in the field of animal health. The study aims to predict the presence of LSD in an area by utilizing the LSD dataset obtained from Mendeley Data. The number of lumpy infected cases is so low that it creates imbalanced data, posing a challenge in training machine learning models. Handling the unbalanced data is performed by sampling technique using the Random Under-sampling technique and Synthetic Minority Oversampling Technique (SMOTE). The Random Forest classification model was trained on sample data to predict cases of lumpy infection. The Random Forest classifier performs very well on both under-sampling and oversampling data. Measurement of performance metrics shows that SMOTE has a superior score of 1-2% compared to the use of Random Undersampling. Furthermore, Re-call rate, which is the metric we want to maximize in identifying lumpy cases, is superior when using SMOTE and has slightly better precision than Random Undersampling. This research only focuses on how to balance unbalanced data classes so that the optimization of the model has not been implemented, which creates opportunities for further research in the future.

Downloads

Download data is not yet available.

References

I. Lojkic, “Complete Genome Sequence of a Lumpy Skin Disease Virus Strain Isolated from the Skin of a Vaccinated Animal,” Am. Soc. Microbiol., pp. 1–2, 2018, doi: https://doi.org/10.1128/genomeA .00482-18.

R. Garcia, “‘One Wefare’: a framework to support the implementation of OIE animal wefare standards,” OIE, pp. 3–13, 2017, doi: 10.20506/bull.2017.1.2589.

A. Sprygin, Y. Pestova, D. B. Wallace, E. Tuppurainen, and A. V Kononov, “Transmission of lumpy skin disease virus : A short review,” Virus Res., vol. 269, no. May, p. 197637, 2019, doi: 10.1016/j.virusres.2019.05.015.

K. Pertanian, “Kementan Siapkan Sumberdaya Tangani Lumpy Skin Disease Pada Sapi Di Riau,” 2022. http://ditjenpkh.pertanian.go.id/kementan-siapkan-sumberdaya-tangani-lumpy-skin-disease-pada-sapi-di-riau (accessed Apr. 07, 2022).

W. Molla, K. Frankena, G. Gari, M. Kidane, D. Shegu, and M. C. M. De Jong, “Seroprevalence and risk factors of lumpy skin disease in Ethiopia,” Prev. Vet. Med., 2018, doi: 10.1016/j.prevetmed.2018.09.029.

E. A. Safavi, “Assessing machine learning techniques in forecasting lumpy skin disease occurrence based on meteorological and geospatial features,” Trop. Anim. Health Prod., pp. 1–11, 2022, doi: 10.1007/s11250-022-03073-2.

G. Rai et al., “A Deep Learning Approach to Detect Lumpy Skin Disease in Cows,” EasyChair Prepr., 2020.

N. A. Ghafoor, “MasPA : A Machine Learning Application to Predict Risk of Mastitis in Cattle from AMS Sensor Data,” AgriEngineering, vol. 3, pp. 575–583, 2021.

R. M. Hyde et al., “Automated prediction of mastitis infection patterns in dairy herds using machine learning,” Sci. Rep., vol. 10, no. 4289, pp. 1–8, 2020, doi: 10.1038/s41598-020-61126-8.

G. Workee, “Cattle skin diseases identification model using machine learning approach,” Bahir Dar University, 2021.

H. A. Rojas, B. J. White, D. E. Amrine, and R. L. Larson, “Predicting Bovine Respiratory Disease Risk in Feedlot Cattle in the First 45 Days Post Arrival,” MDPI, 2022.

S. Van Der Beek, G. Layer, and G. Gmbh, “Prediction of postpartum diseases of dairy cattle using machine learning,” in Proceedings of the World Congress on Genetics Applied to Livestock Production, 2018, no. 11.104.

R. Liang et al., “Prediction for global African swine fever outbreaks based on a combination of random forest algorithms and meteorological data,” Transbound. Emerg. Dis., vol. 67, no. September 2019, pp. 935–946, 2020, doi: 10.1111/tbed.13424.

J. Kaler, J. Mitsch, J. A. Vázquez-diosdado, N. Bollard, T. Dottorini, and K. A. Ellis, “Automated detection of lameness in sheep using machine learning approaches : novel insights into behavioural differences among lame and non-lame sheep,” R. Soc. Open Sci., vol. 7, no. 190824, 2020, doi: http://dx.doi.org/10.1098/rsos.190824.

M. S. Shelke, P. R. Deshmukh, and P. V. K. Shandilya, “A Review on Imbalanced Data Handling Using Undersampling and Oversampling Technique,” Int. J. Recent Trends Eng. Res. Res., vol. 3, no. 4, pp. 444–449, 2017.

S. Feng, J. Keung, X. Yu, Y. Xiao, and M. Zhang, “Investigation on the stability of SMOTE-based oversampling techniques in software defect prediction,” Inf. Softw. Technol., vol. 139, no. August 2020, p. 106662, 2021, doi: 10.1016/j.infsof.2021.106662.

R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques : Overview Study and Experimental Results,” Int. Conf. Inf. Commun. Syst. Fig., pp. 243–248, 2020, doi: 10.1109/ICICS49469.2020.239556.

S. Bagui and K. Li, “Resampling imbalanced data for network intrusion detection datasets,” J. Big Data, vol. 8, no. 1, 2021, doi: 10.1186/s40537-020-00390-x.

S. Goyal, Handling Class-Imbalance with KNN (Neighbourhood) Under-Sampling for Software Defect Prediction, vol. 55, no. 3. Springer Netherlands, 2022.

S. Demir and E. K. Şahin, “Evaluation of Oversampling Methods ( OVER , SMOTE , and ROSE ) in Classifying Soil Liquefaction Dataset based on SVM , RF , and Naïve Bayes SVM , RF ve Naive Bayes ’ e Dayalı Olarak Zemin Sıvılaşma Veri Setinin Sınıflandırılmasında Aşırı Örnekleme Yönteml,” no. 34, pp. 142–147, 2022, doi: 10.31590/ejosat.1077867.

P. Wibowo and C. Fatichah, “An in-depth performance analysis of the oversampling techniques for high-class imbalanced dataset,” vol. 7, no. January, pp. 63–71, 2021.

Y. Xiao and J. Lian, “Credit Card Fraud Detection using SMOTE and Ensemble Methods Credit Card Fraud Detection using SMOTE and Ensemble,” Int. J. Eng. Res. Sci., vol. 7, no. 8, 2021.

N. Mqadi, N. Naicker, and T. Adeliyi, “A SMOTe based oversampling data-point approach to solving the credit card data imbalance problem in financial fraud detection,” Int. J. Comput. Digit. Syst., vol. 10, no. 1, pp. 277–286, 2021, doi: 10.12785/IJCDS/100128.

C. H. Bhavani and A. Govardhan, “Materials Today : Proceedings Cervical cancer prediction using stacked ensemble algorithm with SMOTE and RFERF,” Mater. Today Proc., no. xxxx, 2021, doi: 10.1016/j.matpr.2021.07.269.

B. Pes, “Learning from High-Dimensional and Class-Imbalanced Datasets Using Random Forests,” Information, vol. 12, no. 286, 2021, doi: 10.3390/info12080286.

N. A. Farhana, F. M. Afendi, A. Fitrianto, and S. H. Wijaya, “Classification modeling of support vector machine ( SVM ) and random forest in predicting pharmacodynamics interactions,” J. Phys. Conf. Ser. Pap., 2021, doi: 10.1088/1742-6596/1863/1/012067.

S. Wang, Y. Dai, J. Shen, and J. Xuan, “Research on expansion and classification of imbalanced data based on SMOTE algorithm,” Sci. Rep., vol. 11, no. 1, pp. 1–11, 2021, doi: 10.1038/s41598-021-03430-5.

A. Abdul, R. M. Isiaka, R. S. Babatunde, and J. F. Ajao, “An Improved Coronary Heart Disease Predictive System Using Random Forest,” Asian J. Res. Comput. Sci., vol. 11, no. 1, pp. 17–27, 2021, doi: 10.9734/AJRCOS/2021/v11i130253.

H. Byeon, “Exploring Parkinson ’ s Disease Predictors based on Basic Intelligence Quotient and Executive Intelligence Quotient,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 4, pp. 106–111, 2021.

V. M. Putri, M. Masjkur, and C. Suhaeni, “Handling Problems of Credit Data for Imbalanced Classes using SMOTEXGBoost Handling Problems of Credit Data for Imbalanced Classes using SMOTEXGBoost,” J. Phys. Conf. Ser. Pap., vol. 1830, 2021, doi: 10.1088/1742-6596/1830/1/012011.

V. M. Putri, M. Masjkur, and C. Suhaeni, “Performance of SMOTE in a random forest and naive Bayes classifier for imbalanced Hepatitis-B vaccination status Performance of SMOTE in a random forest and naive Bayes classifier for imbalanced Hepatitis-B vaccination status,” J. Phys. Conf. Ser. Pap., 2021, doi: 10.1088/1742-6596/1863/1/012073.

A. Kishor and C. Chakraborty, “Early and accurate prediction of diabetics based on FCBF feature selection and SMOTE,” Int. J. Syst. Assur. Eng. Manag., 2021, doi: 10.1007/s13198-021-01174-z.

K. Gowri and M. Saranya, “Cervical Cancer Prediction using Outlier deduction and Over sampling methods,” Int. J. Innov. Res. Eng., vol. 3, no. 3, pp. 186–190, 2022.

L. E. O. Breiman, “Random Forest,” in Machine Learning, 2001, pp. 5–32.

S. M. Learning, “Bagging and Random Forest for Imbalanced Classification,” 2021.

S. Chiu and D. Tavella, “Introduction to Data Mining,” Data Min. Mark. Intell. Optim. Mark. Returns, pp. 137–192, 2008, doi: 10.1016/b978-0-7506-8234-3.00007-1.

R. Fekadu, “Machine Learning Models Evaluation and Feature Importance Analysis on NPL Dataset,” no. NeurIPS 2021.

J. Miao and W. Zhu, “Precision-recall curve ( PRC ) classification trees,” Evol. Intell., 2021, doi: 10.1007/s12065-021-00565-2.

S. Pérez, F. Pablo, M. Camblor, and P. Filzmoser, “Visualizing the decision rules behind the ROC curves : understanding the classification process,” AStA Adv. Stat. Anal., no. 0123456789, 2020, doi: 10.1007/s10182-020-00385-2.