Educational Data Mining for Predicting Student Graduation Using the Naïve Bayes Classifier Algorithm

Educational Data Mining untuk Prediksi Kelulusan Mahasiswa Menggunakan Algoritme Naïve Bayes Classifier

  • Edi Sutoyo Program Studi Sistem Informasi, Fakultas Rekayasa Industri, Universitas Telkom
  • Ahmad Almaarif Program Studi Sistem Informasi, Fakultas Rekayasa Industri, Universitas Telkom
Keywords: Data Mining, Classification, Naive Bayes Classifier, Student Graduation


The quality of students can be seen from the academic achievements, which are evidence of the efforts made by students. Student academic achievement is evaluated at the end of each semester to determine the learning outcomes that have been achieved. If a student cannot meet certain academic criteria that are stated by fulfilling the requirements to continue his studies, the student may have the potential to not graduate on time or even Drop Out (DO). The high number of students who do not graduate on time or DO in higher education institutions can be minimized by detecting students who are at risk in the early stages of education and is supported by making policies that can direct students to complete their education. Also, if the time for completion of student studies can be predicted then the handling of students will be more effective. One technique for making predictions that can be used is data mining techniques. Therefore, in this study, the Naive Bayes Classifier (NBC) algorithm will be used to predict student graduation at Telkom University. The dataset was obtained from the Information Systems Directorate (SISFO), Telkom University which contained 4000 instance data. The results of this study prove that NBC was successfully implemented to predict student graduation. Prediction of the graduation of these students is able to produce an accuracy of 73,725%, precision 0.742, recall 0.736 and F-measure of 0.735.


Download data is not yet available.


P. A. Murtaugh, L. D. Burns, and J. Schuster, “Predicting the retention of university students,” Res. High. Educ., vol. 40, no. 3, pp. 355–371, 1999.

C. Márquez-Vera, C. Romero Morales, and S. Ventura Soto, “Predicting school failure and dropout by using data mining techniques,” Rev. Iberoam. Tecnol. del Aprendiz., vol. 8, no. 1, pp. 7–14, 2013.

Z. Ibrahim and D. Rusli, “Predicting students’ academic performance: comparing artificial neural network, decision tree and linear regression,” in 21st Annual SAS Malaysia Forum, 5th September, 2007.

C. Márquez-Vera, A. Cano, C. Romero, and S. Ventura, “Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data,” Appl. Intell., vol. 38, no. 3, pp. 315–330, 2013.

J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. 2012.

I. T. R. Yanto, P. Vitasari, T. Herawan, and M. M. Deris, “Applying variable precision rough set model for clustering student suffering study’s anxiety,” Expert Syst. Appl., vol. 39, no. 1, pp. 452–459, 2012.

E. Sutoyo, M. Mungad, S. Hamid, and T. Herawan, “An efficient soft set-based approach for conflict analysis,” PLoS One, vol. 11, no. 2, 2016.

E. Sutoyo, I. T. R. Yanto, R. R. Saedudin, and T. Herawan, “A soft set-based co-occurrence for clustering web user transactions,” Telkomnika (Telecommunication Comput. Electron. Control., vol. 15, no. 3, 2017.

E. Sutoyo, I. T. R. Yanto, Y. Saadi, H. Chiroma, S. Hamid, and T. Herawan, “A Framework for Clustering of Web Users Transaction Based on Soft Set Theory,” in Springer, 2019, pp. 307–314.

I. T. R. Yanto, E. Sutoyo, A. Apriani, and O. Verdiansyah, “Fuzzy Soft Set for Rock Igneous Clasification,” in 2018 International Symposium on Advanced Intelligent Informatics (SAIN), 2018, pp. 199–203.

E. Sutoyo, R. R. Saedudin, I. T. R. Yanto, and A. Apriani, “Application of adaptive neuro-fuzzy inference system and chicken swarm optimization for classifying river water quality,” in Electrical, Electronics and Information Engineering (ICEEIE), 2017 5th International Conference on, 2017, pp. 118–122.

M.-L. Antonie, O. R. Zaiane, and A. Coman, “Application of data mining techniques for medical image classification,” in Proceedings of the Second International Conference on Multimedia Data Mining, 2001, pp. 94–101.

H. Chiroma et al., “An intelligent modeling of oil consumption,” Adv. Intell. Syst. Comput., vol. 320, 2015.

A. R. Muhajir, E. Sutoyo, and I. Darmawan, “Forecasting Model Penyakit Demam Berdarah Dengue Di Provinsi DKI Jakarta Menggunakan Algoritma Regresi Linier Untuk Mengetahui Kecenderungan Nilai Variabel Prediktor Terhadap Peningkatan Kasus,” Fountain Informatics J., vol. 4, no. 2, pp. 33–40, Nov. 2019.

C. Romero and S. Ventura, “Educational data mining: A survey from 1995 to 2005,” Expert Syst. Appl., vol. 33, no. 1, pp. 135–146, 2007.

P. Domingos and M. Pazzani, “On the Optimality of the Simple Bayesian Classifier under Zero-One Loss,” Mach. Learn., vol. 29, no. 2–3, pp. 103–130, 1997.

P. Bhargavi and S. Jyothi, “Applying Naive Bayes Data Mining Technique for Classification of Agricultural Land Soils,” IJCSNS Int. J. Comput. Sci. Netw. Secur., vol. 9, no. 8, pp. 117–122, 2009.

R. Mccue, “A Comparison of the Accuracy of Support Vector Machine and Nave Bayes Algorithms In Spam Classification,” p. 17, 2009.

L. C. Huang, S. Y. Hsu, and E. Lin, “A comparison of classification methods for predicting chronic fatigue syndrome based on genetic data,” J. Transl. Med., vol. 7, p. 81, 2009.

S. Hassan, M. Rafi, and M. S. Shaikh, “Comparing SVM and Naïve Bayes classifiers for text categorization with Wikitology as knowledge enrichment,” Proc. 14th IEEE Int. Multitopic Conf. 2011, INMIC 2011, pp. 31–34, 2011.

I. Rish, “An empirical study of the naive Bayes classifier,” in IJCAI 2001 workshop on empirical methods in artificial intelligence, 2001, vol. 3, no. 22, pp. 41–46.

D. L. Olson and D. Delen, Advanced data mining techniques. Springer Berlin Heidelberg, 2008.

How to Cite
Sutoyo, E., & Almaarif, A. (2020). Educational Data Mining for Predicting Student Graduation Using the Naïve Bayes Classifier Algorithm. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 4(1), 95 - 101.
Artikel Teknologi Informasi