Classifying Quranic Verse Topics using Word Centrality Measure

Ferdian Yulianto; Kemas Muslim Lhaksmana; Danang Triantoro Murdiansyah

doi:10.29207/resti.v5i3.3171

Ferdian Yulianto Telkom University
Kemas Muslim Lhaksmana Telkom University
Danang Triantoro Murdiansyah Telkom University

DOI: https://doi.org/10.29207/resti.v5i3.3171

Keywords: The Holy Quran, centrality, topic classification, SVM, naive Bayes, multilabel classification

Abstract

Muslims believe that, as the speech of Allah, The Quran is a miracle that has specialties in itself. Some of the specialties that have studied are the regularities in the number of letters, words, vocabularies, etc. In the past, the early Islamic scholars identify these regularities manually, i.e. by counting the occurrence of each vocabulary by hand. This research tackles this problem by utilizing centrality in quranic verse topic classification. The goal of this research is to analyze the effect of The Quran word centrality measure on the topic classification of The Quran verses. To achieve this objective, the method of this research is constructing the Quran word graph, then the score of centralities included as one of the features in the verse topic classification. The effect of centrality is observed along with support vector machine (SVM) and naïve Bayes classifiers by performing two scenarios (with stopword and without stopword removal). The result shows that according to the centrality measure the word “الله” (Allah) is the most central in The Quran. The performance evaluation of the classification models shows that the use of centrality improves the hamming loss score from 0.43 to 0.21 on naïve Bayes classifier with stopword removal. Finally, both of classification method has a better performance in word graph that use stopword removal.

Downloads

Download data is not yet available.

References

A. Ishtiaq, M. A. Islam, M. Azhar Iqbal, M. Aleem, and U. Ahmed, “Graph Centrality Based Spam SMS Detection,” Proc. 2019 16th Int. Bhurban Conf. Appl. Sci. Technol. IBCAST 2019, no. March, pp. 629–633, 2019, doi: 10.1109/IBCAST.2019.8667174.

F. D. Malliaros and K. Skianis, “Graph-Based Term Weighting for Text Categorization,” Proc. 2015 IEEE/ACM Int. Conf. Adv. Soc. Networks Anal. Min. 2015, pp. 1473–1479, 2015, doi: 10.1145/2808797.2808872.

N. Shanavas, H. Wang, Z. Lin, and G. Hawe, “Supervised graph-based term weighting scheme for effective text classification,” Front. Artif. Intell. Appl., vol. 285, pp. 1710–1711, 2016, doi: 10.3233/978-1-61499-672-9-1710.

W. Al Etaiwi, A. A. Awajan, and D. Suleiman, “Keywords Extraction from Arabic Documents Using Centrality Measures,” 2019 6th Int. Conf. Soc. Networks Anal. Manag. Secur. SNAMS 2019, pp. 237–241, 2019, doi: 10.1109/SNAMS.2019.8931808.

G. I. Ulumudin, A. Adiwijaya, and M. S. Mubarok, “A multilabel classification on topics of qur’anic verses in English translation using K-Nearest Neighbor method with Weighted TF-IDF,” J. Phys. Conf. Ser., vol. 1192, no. 1, 2019, doi: 10.1088/1742-6596/1192/1/012026.

F. S. Nurfikri and Adiwijaya, “A comparison of Neural Network and SVM on the multi-label classification of Quran verses topic in English translation,” J. Phys. Conf. Ser., vol. 1192, no. 1, 2019, doi: 10.1088/1742-6596/1192/1/012030.

M. I. Rahman, N. A. Samsudin, A. Mustapha, and A. Abdullahi, “Comparative analysis for topic classification in Juz Al-Baqarah,” Indones. J. Electr. Eng. Comput. Sci., vol. 12, no. 1, pp. 406–411, 2018, doi: 10.11591/ijeecs.v12.i1.pp406-411.

M. Granik and V. Mesyura, “Fake news detection using naive Bayes classifier,” 2017 IEEE 1st Ukr. Conf. Electr. Comput. Eng. UKRCON 2017 - Proc., pp. 900–903, 2017, doi: 10.1109/UKRCON.2017.8100379.

A. H. Mohammad, T. Alwada’n, and O. Almomani, “Arabic Text Categorization Using Support vector machine, Naïve Bayes and Neural Network,” Glob. Sci. Technol. Forum J. Comput., vol. Volume 5, no. 1, pp. 108–115, 2016, doi: 10.7603/s40601-016-0016-9.

N. F. Hardifa and K. M. Lhaksmana, “Topic Classification of Islamic Question and Answer Using Naive Bayes Classifier,” vol. 4, no. August, pp. 199–204, 2019, doi: 10.21108/indojc.2019.4.2.346.

A. O. Adeleke, N. A. Samsudin, A. Mustapha, and N. M. Nawi, “Comparative analysis of text classification algorithms for automated labelling of Quranic verses,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 7, no. 4, pp. 1419–1427, 2017, doi: 10.18517/ijaseit.7.4.2198.

H. Zarrabi-Zadeh, “Tanzil Documents,” 2007. http://tanzil.net/docs/home (accessed Nov. 19, 2020).

A. L. Fathullah, “Indeks Tematik Al-Qur’an,” Pusat Kajian Hadist. https://alquranalhadi.com/ (accessed Nov. 27, 2020).

M. Ahmadi, E. Khadangi, S. P. Shariatpanahi, and M. H. Foroughmand-Araabi, “Presenting a computing method for finding the central verse of Quranic surahs,” 2018 8th Int. Conf. Comput. Knowl. Eng. ICCKE 2018, no. Iccke, pp. 308–313, 2018, doi: 10.1109/ICCKE.2018.8566366.

F. Boudin and L. U. M. R. Cnrs, “A Comparison of Centrality Measures for Graph-Based Keyphrase Extraction,” Ijcnlp, no. October, pp. 834–838, 2013.

E. Mailoa, “Analisis Node dengan Centrality dan Follower Rank pada Twitter,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 5, pp. 937–942, 2020, doi: 10.29207/resti.v4i5.2398.

A. I. Kadhim, “Survey on supervised machine learning techniques for automatic text classification,” Artif. Intell. Rev., vol. 52, no. 1, pp. 273–292, 2019, doi: 10.1007/s10462-018-09677-1.

K. Skianis, F. D. Malliaros, and M. Vazirgiannis, “Fusing document, collection and label graph-based representations with word embeddings for text classification,” NAACL HLT 2018 - 2018 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Student Res. Work., pp. 49–58, 2018, doi: 10.18653/v1/w18-1707.

R. Irmanita, Sri Suryani Prasetiyowati, and Yuliant Sibaroni, “Classification of Malaria Complication Using CART (Classification and Regression Tree) and Naïve Bayes,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 1, pp. 10–16, 2021, doi: 10.29207/resti.v5i1.2770.

R. B. Pereira, A. Plastino, B. Zadrozny, and L. H. C. Merschmann, “Correlation analysis of performance measures for multi-label classification,” Inf. Process. Manag., vol. 54, no. 3, pp. 359–369, 2018, doi: 10.1016/j.ipm.2018.01.002.

Sharazita Dyah Anggita and Ikmah, “Algorithm Comparation of Naive Bayes and Support Vector Machine based on Particle Swarm Optimization in Sentiment Analysis of Freight Forwarding Services,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 4, no. 2, pp. 362–369, 2020, doi: 10.29207/resti.v4i2.1840.