Gradient Boosting Machine, Random Forest dan Light GBM untuk Klasifikasi Kacang Kering

Gradient Boosting Machine, Random Forest and Light GBM for Classification of Dried Beans

  • Indrawata Wardhana Wardhana
  • Musi Ariawijaya UIN Sulthan Thaha Saifuddin Jambi
  • Vandri Ahmad Isnaini UIN Sulthan Thaha Saifuddin Jambi
  • Rahmi Putri Wirman UIN Sulthan Thaha Saifuddin Jambi
Keywords: GBM, RF, LightGBM, Bean Classification, BoxCox


Bean seed classification is critical in determining the quality of beans. Previously, the same dataset was tested using the MLP, SVM, KNN, and DT algorithms, with SVM producing the best results. The purpose of this study is to determine the most effective model through the use of the BoxCox transformation selection feature and the random forest (RF) algorithm, as well as the gradient boosting machine (GBM), light GBM, and repeated k-folds evaluation model. The bean dataset is available on the UCI Repository website. The BoxCox transformation and repeated k-folds improved the classification prediction's accuracy. The model is used in the optimal training phase for a random forest with decision tree parameters 50 and depth 10, a gradient boosting machine model with a learning rate of 1, and a light gradient boosting machine model with a learning rate of 0.5 and estimator of 500. The best training accuracy results are obtained with light GBM. which is 99 percent accurate, but only 91 percent accurate in terms of validation. According research, the Barbunya, Bombay, Cali, Dermason, Horoz, Seker, and Sira beans classes provided accuracy values of 91 percent, 100 percent, 92 percent, 92 percent, 95 percent, 94 percent, and 84 percent, respectively.  


Download data is not yet available.


D. F. Barbin et al., "Classification and compositional characterization of different varieties of cocoa beans by near infrared spectroscopy and multivariate statistical analyses," J. Food Sci. Technol. 2018 557, vol. 55, no. 7, pp. 2457-2466, Apr. 2018.

R. Megías-Pérez, S. Grimbs, R. N. D'Souza, H. Bernaert, and N. Kuhnert, "Profiling, quantification and classification of cocoa beans based on chemometric analysis of carbohydrates using hydrophilic interaction liquid chromatography coupled to mass spectrometry," Food Chem., vol. 258, pp. 284-294, Aug. 2018.

A. J. Myles, S. D. Brown, and T. A. Zimmerman, "Transfer of Multivariate Classification Models Between Laboratory and Process Near-Infrared Spectrometers for the Discrimination of Green Arabica and Robusta Coffee Beans," Appl. Spectrosc. Vol. 60, Issue 10, pp. 1198-1203, vol. 60, no. 10, pp. 1198-1203, Oct. 2006.

F. Kurniawan, I. W. Budiastra, Sutrisno, and S. Widyotomo, "Classification of Arabica Java Coffee Beans Based on Their Origin using NIR Spectroscopy," IOP Conf. Ser. Earth Environ. Sci., vol. 309, no. 1, p. 012006, Sep. 2019.

A. Vázquez-Ovando, F. Molina-Freaner, J. Nuñez-Farfán, D. Betancur-Ancona, and M. Salvador-Figueroa, "Classification of cacao beans (Theobroma cacao L.) of southern Mexico based on chemometric analysis with multivariate approach," Eur. Food Res. Technol. 2015 2406, vol. 240, no. 6, pp. 1117-1128, Feb. 2015.

Y. Li, J. Sun, X. Wu, Q. Chen, B. Lu, and C. Dai, "Detection of viability of soybean seed based on fluorescence hyperspectra and CARS-SVM-AdaBoost model," J. Food Process. Preserv., vol. 43, no. 12, p. e14238, Dec. 2019.

M. Koklu and I. A. Ozkan, "Multiclass classification of dry beans using computer vision and machine learning techniques," Comput. Electron. Agric., vol. 174, p. 105507, Jul. 2020.

S. A. Araújo, W. A. L. Alves, P. A. Belan, and K. P. Anselmo, "A Computer Vision System for Automatic Classification of Most Consumed Brazilian Beans," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9475, pp. 45-53, Dec. 2015.

E. M. De Oliveira, D. S. Leme, B. H. G. Barbosa, M. P. Rodarte, and R. G. F. Alvarenga Pereira, "A computer vision system for coffee beans classification based on computational intelligence techniques," J. Food Eng., vol. 171, pp. 22-27, Feb. 2016.

F. A. Santos, A. M. P. Canuto, B. R. C. Bedregal, E. S. Palmeira, and I. N. P. Silva, "Supervised Methods Applied to the Construction of a Vision System for the Classification of Cocoa Beans in the Cut-Test," An. do Encontro Nac. Inteligência Artif. e Comput., pp. 72-83, Oct. 2019.

A. J. Ona Ona, F. Grijalva, K. Proano, B. Acuna, and M. Garcia, "Classification of fresh cocoa beans with pulp based on computer vision," 2020 IEEE ANDESCON, ANDESCON 2020, Oct. 2020.

R. M. Sakia, "The Box‐Cox transformation technique: a review," J. R. Stat. Soc. Ser. D …, 1992.

M. J. Gurka, L. J. Edwards, K. E. Muller, and L. L. Kupper, "Extending the Box-Cox transformation to the linear mixed model," J. R. Stat. Soc. Ser. A (Statistics Soc., vol. 169, no. 2, pp. 273-288, Mar. 2006.

J. Osborne, "Improving your data transformations: Applying the Box-Cox transformation," Pract. Assessment, Res. …, 2010.

M. Z. Hossain, "The use of Box-Cox transformation technique in economic and statistical analyses," J. Emerg. Trends Econ. …, 2011.

L. Wang, X. Zhou, X. Zhu, Z. Dong, and W. Guo, "Estimation of biomass in wheat using random forest regression algorithm and remote sensing data," Crop J., vol. 4, no. 3, pp. 212-219, Jun. 2016.

A. T. Azar, H. I. Elshazly, A. E. Hassanien, and A. M. Elkorany, "A random forest classifier for lymph diseases," Comput. Methods Programs Biomed., vol. 113, no. 2, pp. 465-473, Feb. 2014.

S. Vitrack-Tamam et al., "Random Forest Algorithm Improves Detection of Physiological Activity Embedded within Reflectance Spectra Using Stomatal Conductance as a Test Case," Remote Sens. 2020, Vol. 12, Page 2213, vol. 12, no. 14, p. 2213, Jul. 2020.

S. Agajanian, O. Oluyemi, and G. M. Verkhivker, "Integration of random forest classifiers and deep convolutional neural networks for classification and biomolecular modeling of cancer driver mutations," Front. Mol. Biosci., vol. 6, no. JUN, p. 44, 2019.

C. L. Eng, J. C. Tong, and T. W. Tan, "Predicting host tropism of influenza A virus proteins using random forest," BMC Med. Genomics, vol. 7, no. 3, pp. 1-11, Dec. 2014.

S. Touzani, J. Granderson, and S. Fernandes, "Gradient boosting machine for modeling the energy consumption of commercial buildings," Energy Build., vol. 158, pp. 1533-1543, Jan. 2018.

Y. Zhang and A. Haghani, "A gradient boosting method to improve travel time prediction," Transp. Res. Part C Emerg. Technol., vol. 58, pp. 308-324, Sep. 2015.

C. Bentéjac, A. Csörgő, and G. Martínez-Muñoz, "A comparative analysis of gradient boosting algorithms," Artif. Intell. Rev. 2020 543, vol. 54, no. 3, pp. 1937-1967, Aug. 2020.

G. Ke et al., "LightGBM: A Highly Efficient Gradient Boosting Decision Tree," Adv. Neural Inf. Process. Syst., vol. 30, 2017.

X. Ma, J. Sha, D. Wang, Y. Yu, Q. Yang, and X. Niu, "Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning," Electron. Commer. Res. Appl., vol. 31, pp. 24-39, Sep. 2018.

D. Wang, Y. Zhang, and Y. Zhao, "LightGBM: An effective miRNA classification method in breast cancer patients," ACM Int. Conf. Proceeding Ser., pp. 7-11, Oct. 2017.

"UCI Machine Learning Repository: Dry Bean Dataset Data Set." [Online]. Available: [Accessed: 26-Nov-2021].

M. Pal, "Random forest classifier for remote sensing classification,", vol. 26, no. 1, pp. 217-222, Jan. 2007.

How to Cite
Wardhana, I., Musi Ariawijaya, Vandri Ahmad Isnaini, & Rahmi Putri Wirman. (2022). Gradient Boosting Machine, Random Forest dan Light GBM untuk Klasifikasi Kacang Kering. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(1), 92 - 99.
Information Systems Engineering Articles