Analisis Metode Representasi Teks Untuk Deteksi Interelasi Kitab Hadis: Systematic Literature Review

Amelia Devi Putri Ariyanto; Chastine fatichah; Agus Zainal Arifin

doi:10.29207/resti.v5i5.3499

Amelia Devi Putri Ariyanto Institut Teknologi Sepuluh Nopember
Chastine fatichah
Agus Zainal Arifin

DOI: https://doi.org/10.29207/resti.v5i5.3499

Keywords: Hadith, Interrelation, Text Document Classification, Text Representation

Abstract

Hadith is the second source of reference for Islamic law after the Qur'an, which explains the sentences in the Qur'an which are still global by referring to the provisions of the Prophet Muhammad SAW. Classification of text documents can also be used to overcome the problem of interrelation between the Qur'an and hadith. The problem of interrelation between books of hadith needs to be done because some hadiths in certain hadith books have the same meaning as other hadith books. This study aims to analyze the development of text representation and classification methods suitable to overcome similarity meaning problems in detecting interrelationships between hadith books. The research method used is Systematic Literature Review (SLR) sourced from Google Scholar, Science Direct, and IEEE. There are 42 pieces of literature that have been studied successfully. The results showed that contextual embedding as the newest text representation method considered word context and sentence meaning better than static embedding. As a classification method, the ensemble method has better performance in classifying text documents than using only a single classifier model. Thus, future research can consider using a combination of contextual embedding and ensemble methods to detect interrelationships between books of hadith.

Downloads

Download data is not yet available.

References

N. A. P. Rostam and N. H. A. H. Malim, 2021, “Text categorisation in Quran and Hadith: Overcoming the interrelation challenges using machine learning and term weighting”, J. King Saud Univ. - Comput. Inf. Sci., pp. 658–667, doi: 10.1016/j.jksuci.2019.03.007.

V. Amrizal, 2018, “Penerapan Metode Term Frequency Inverse Document Frequency (Tf-Idf) Dan Cosine Similarity Pada Sistem Temu Kembali Informasi Untuk Mengetahui Syarah Hadits Berbasis Web (Studi Kasus: Hadits Shahih Bukhari-Muslim)”, J. Tek. Inform., pp. 149–164, doi: 10.15408/jti.v11i2.8623.

A. T. Ni’mah and A. Z. Arifin, 2020, “Perbandingan Metode Term Weighting terhadap Hasil Klasifikasi Teks pada Dataset Terjemahan Kitab Hadis”, Rekayasa, pp. 172–180, 2020, doi: 10.21107/rekayasa.v13i2.6412.

M. Y. Abu Bakar, Adiwijaya, and S. Al Faraby, 2019, “Multi-Label Topic Classification of Hadith of Bukhari (Indonesian Language Translation) Using Information Gain and Backpropagation Neural Network”, Proc. 2018 Int. Conf. Asian Lang. Process. IALP 2018, pp. 344–350, doi: 10.1109/IALP.2018.8629263.

W. Darmalaksana, C. Slamet, W. B. Zulfikar, I. F. Fadillah, D. S. adillah Maylawati, and H. Ali, 2020,“Latent semantic analysis and cosine similarity for hadith search engine”, Telkomnika (Telecommunication Comput. Electron. Control., pp. 217–227, doi: 10.12928/TELKOMNIKA.V18I1.14874.

A. Abdi, S. Hasan, M. Arshi, S. M. Shamsuddin, and N. Idris, 2020, “A question answering system in hadith using linguistic knowledge”, Comput. Speech Lang., pp. 1-13, doi: 10.1016/j.csl.2019.101023.

E. H. Mohamed and W. H. El-Behaidy, 2021, “An Ensemble Multi‑label Themes‑Based Classification for Holy Qur’an” Arab. J. Sci. Eng, pp. 3519-3529, doi:10.1007/s13369-020-05184-0.

A. O. Adeleke, N. A. Samsudin, A. Mustapha, and N. M. Nawi, 2017, “Comparative analysis of text classification algorithms for automated labelling of Quranic verses”, Int. J. Adv. Sci. Eng. Inf. Technol., pp. 1419–1427, doi: 10.18517/ijaseit.7.4.2198.

F. S. Utomo, N. Suryana, and M. S. Azmi, 2020, “Stemming impact analysis on Indonesian Quran translation and their exegesis classification for ontology instances”, IIUM Eng. J., pp. 33–50, doi: 10.31436/iiumej.v21i1.1170.

P. Mondal, A. Ghosh, A. Sinha, and S. Goswami, 2020, “A Study of Interrelation Between Ratings and User Reviews in Light of Classification,” Adv. Intell. Syst. Comput., pp. 689–697, doi: 10.1007/978-981-13-7403-6_60.

J. C. Carver, E. Hassler, E. Hernandes, and N. A. Kraft, 2013, “Identifying barriers to the systematic literature review process”, Int. Symp. Empir. Softw. Eng. Meas., pp. 203–213, doi: 10.1109/ESEM.2013.28.

D. C. B. Mariano, C. Leite, L. H. S. Santos, R. E. O. Rocha, and R. C. de Melo-Minardi, 2017, “A guide to performing systematic literature reviews in bioinformatics,” arXiv.

Y. Levy and T. J. Ellis, 2006, “A systems approach to conduct an effective literature review in support of information systems research”, Informing Sci., pp. 181–211, doi: 10.28945/479.

S. Hamid, S. Bukhari, S. D. Ravana, A. A. Norman, and M. T. Ijab, 2016, “Role of social media in information-seeking behaviour of international students: A systematic literature review”, Aslib J. Inf. Manag., pp. 643–666, doi: 10.1108/AJIM-03-2016-0031.

R. S. Wahono, 2007, “A Systematic Literature Review of Software Defect Prediction: Research Trends, Datasets, Methods and Frameworks”, J. Softw. Eng., pp. 1–16, doi: 10.3923/jse.2007.1.12.

M. Sobri, M. T. Ijab, and N. M. Nayan, 2018, “Systematic Literature Review untuk Membuat Model Aplikasi Pemantauan Kesehatan Cardiovascular”, J. RESTI (Rekayasa Sist. dan Teknol. Informasi), pp. 458–464, doi: 10.29207/resti.v2i2.428.

A. N. Izzati and N. F. Najwa, 2018, “Pengaruh Stakeholder Perspective Dalam Penerapan ERP: A Systematic Literature Review”, J. Teknol. Inf. dan Ilmu Komput., pp. 41, doi: 10.25126/jtiik.201851540.

Z. Drus and H. Khalid, 2019, “Sentiment analysis in social media and its application: Systematic literature review”, Procedia Comput. Sci., pp. 707–714, doi: 10.1016/j.procs.2019.11.174.

A. Sarı, A. Tosun, and G. I. Alptekin, 2019, “A systematic literature review on crowdsourcing in software engineering”, J. Syst. Softw., pp. 200–219, doi: https://doi.org/10.1016/j.jss.2019.04.027.

B. Kitchenham, E. Mendes, and G. H. Travassos, 2006, “A Systematic Review of Cross- vs. Within-Company Cost Estimation Studies”, pp. 316-329, doi: 10.14236/ewic/ease2006.10.

M. Nabil, M. Aly, and A. F. Atiya, 2015, “ASTD: Arabic sentiment tweets dataset”, Conf. Proc. - EMNLP 2015 Conf. Empir. Methods Nat. Lang. Process., pp. 2515–2519, doi: 10.18653/v1/d15-1299.

O. ElJundi, W. Antoun, N. El Droubi, H. Hajj, W. El-Hajj, and K. Shaban, 2019, “hULMonA: The Universal Language Model in Arabic”, pp. 68–77, doi: 10.18653/v1/w19-4608.

A. Oussous, A. A. Lahcen, and S. Belfkih, 2019, “Impact of text pre-processing and ensemble learning on Arabic sentiment analysis”, ACM Int. Conf. Proceeding Ser., pp. 1-9 doi: 10.1145/3320326.3320399.

G. Adel and Y. Wang, 2019, “Arabic twitter corpus for crisis response messages classification”, ACM Int. Conf. Proceeding Ser., pp. 498–503, doi: 10.1145/3377713.3377799.

I. S. I. Abuhaiba and H. M. Dawoud, 2017, “Combining different approaches to improve arabic text documents classification”, Int. J. Intell. Syst. Appl., pp. 39–52, doi: 10.5815/ijisa.2017.04.05.

M. Saad and W. Ashour, 2010, “OSAC: Open Source Arabic Corpora”, 6th Int. Conf. Electr. Comput. Syst. (EECS’10), pp. 118–123, doi: 10.13140/2.1.4664.9288.

F. zahra El-Alami, S. Ouatik El Alaoui, and N. En Nahnahi, 2021, “Contextual semantic embeddings based on fine-tuned AraBERT model for Arabic text multi-class categorization”, J. King Saud Univ. - Comput. Inf. Sci., pp. 1-7, doi: 10.1016/j.jksuci.2021.02.005.

S. Bahassine, A. Madani, M. Al-Sarem, and M. Kissi, 2020, “Feature selection using an improved Chi-square for Arabic text classification”, J. King Saud Univ. - Comput. Inf. Sci., pp. 225–231, doi: 10.1016/j.jksuci.2018.05.010.

B. Arkok and A. M. Zeki, 2021, “Classification of Quranic Topics Using Ensemble Learning”, pp. 244–248, doi: 10.1109/iccce50029.2021.9467178.

B. S. Arkok and A. M. Zeki, 2021, “Classification of Quranic topics based on imbalanced classification”, Indones. J. Electr. Eng. Comput. Sci., pp. 678, doi: 10.11591/ijeecs.v22.i2.pp678-687.

H. M. Abdelaal, A. M. Ahmed, W. Ghribi, and H. A. Youness Alansary, 2019, “Knowledge Discovery in the Hadith According to the Reliability and Memory of the Reporters Using Machine Learning Techniques”, IEEE Access, pp. 157741–157755, doi: 10.1109/ACCESS.2019.2944118.

M. Ghanem, A. Mouloudi, and M. Mourchid, 2016, “Classification of Hadiths using LVQ based on VSM Considering Words Order”, Int. J. Comput. Appl., pp. 25–28, doi: 10.5120/ijca2016911077.

R. A. Salama, A. Youssef, and A. Fahmy, 2018, “Morphological Word Embedding for Arabic”, Procedia Comput. Sci., pp. 83–93, doi: 10.1016/j.procs.2018.10.463.

A. H. Ombabi, W. Ouarda, and A. M. Alimi, 2020, “Deep learning CNN–LSTM framework for Arabic sentiment analysis using textual information shared in social networks”, Soc. Netw. Anal. Min., pp. 1–13, doi: 10.1007/s13278-020-00668-1.

A. Alwehaibi and K. Roy, 2019, “Comparison of Pre-Trained Word Vectors for Arabic Text Classification Using Deep Learning Approach”, Proc. - 17th IEEE Int. Conf. Mach. Learn. Appl. ICMLA 2018, pp. 1471–1474, doi: 10.1109/ICMLA.2018.00239.

H. Elgabry, S. Attia, A. Abdel-Rahman, A. Abdel-Ate, and S. Girgis, 2021, “A Contextual Word Embedding for Arabic Sarcasm Detection with Random Forests”, Proc. Sixth Arab. Nat. Lang. Process. Work., pp. 340–344.

A. M. Bdeir and F. Ibrahim, 2020, “A framework for arabic tweets multi-label classification using word embedding and neural networks algorithms”, ACM Int. Conf. Proceeding Ser., pp. 105–112, doi: 10.1145/3404512.3404526.

H. Al Saif and T. Alotaibi, 2019, “Arabic text classification using feature-reduction techniques for detecting violence on social media”, Int. J. Adv. Comput. Sci. Appl., pp. 77–87, doi: 10.14569/ijacsa.2019.0100409.

Y. Albalawi, J. Buckley, and N. S. Nikolov, 2021, “Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media”, J. Big Data, pp. 1-29, doi: 10.1186/s40537-021-00488-w.

M. Heikal, M. Torki, and N. El-Makky, 2018, “Sentiment Analysis of Arabic Tweets using Deep Learning”, Procedia Comput. Sci., pp. 114–122, doi: 10.1016/j.procs.2018.10.466.

H. H. Saeed, T. Calders, and F. Kamiran, 2020, “OSACT4 Shared Tasks: Ensembled Stacked Classification for Offensive and Hate Speech in Arabic Tweets”, Proc. 4th Work. Open-Source Arab. Corpora Process. Tools, with a Shar. Task Offensive Lang. Detect., pp. 71–75.

M. M. Ashi, M. A. Siddiqui, and F. Nadeem, 2019, “Pre-trained Word Embeddings for Arabic Aspect-Based Sentiment Analysis of Airline Tweets”, Adv. Intell. Syst. Comput., pp. 241–251, doi: 10.1007/978-3-319-99010-1_22.

A. Mohammed and R. Kora, 2019, “Deep learning approaches for Arabic sentiment analysis,” Soc. Netw. Anal. Min., pp. 1–12, doi: 10.1007/s13278-019-0596-4.

B. Haidar, M. Chamoun, and A. Serhrouchni, 2019, “Arabic Cyberbullying Detection: Enhancing Performance by Using Ensemble Machine Learning”, 2019 Int. Conf. Internet Things IEEE Green Comput. Commun. IEEE Cyber, Phys. Soc. Comput. IEEE Smart Data, pp. 323–327, doi: 10.1109/iThings/GreenCom/CPSCom/SmartData.2019.00074.

S. H. Kumhar, M. M. Kirmani, J. Sheetlani, and M. Hassan, 2021, “Word Embedding Generation For Urdu Language Using Word2vec Model”, Mater. Today Proc., doi: 10.1016/j.matpr.2020.11.766.

S. Alsafari, S. Sadaoui, and M. Mouhoub, 2020, “Hate and offensive speech detection on Arabic social media”, Online Soc. Networks Media, pp. 1-15, doi: 10.1016/j.osnem.2020.100096.

N. Alghamdi and F. Assiri, 2020, “Solving the Cold-Start Problem in Recommender Systems Using Contextual Information in Arabic from Calendars”, Arab. J. Sci. Eng., pp. 10939–10947, doi: 10.1007/s13369-020-04890-z.

L. Al Qadi, H. El Rifai, S. Obaid, and A. Elnagar, 2019, “Arabic text classification of news articles using classical supervised classifiers”, 2019 2nd Int. Conf. New Trends Comput. Sci. ICTCS 2019 - Proc., pp. 1–6, doi: 10.1109/ICTCS.2019.8923073.

H. A. Almuzaini and A. M. Azmi, 2020, “Impact of Stemming and Word Embedding on Deep Learning-Based Arabic Text Categorization”, IEEE Access, pp. 127913–127928, doi: 10.1109/ACCESS.2020.3009217.

U. Naqvi, A. Majid, and S. Ali Abbas, 2021, “UTSA: Urdu Text Sentiment Analysis Using Deep Learning Methods”, IEEE Access, pp. 114085 - 114094, doi: 10.1109/access.2021.3104308.

R. S. Bennabi and Z. Elberrichi, 2019, “An Empirical Study on the effect of weighting schemes and Machine Learning algorithms on the Arabic text Classification”, pp. 5–8.

M. Alkhatib, A. A. Monem, and K. Shaalan, 2017, “A Rich Arabic WordNet Resource for Al-Hadith Al-Shareef”, Procedia Comput. Sci., pp. 101–110, doi: 10.1016/j.procs.2017.10.098.

M. F. Afianto, Adiwijaya, and S. Al-Faraby, 2018, “Text Categorization on Hadith Sahih Al-Bukhari using Random Forest”, J. Phys. Conf. Ser., pp. 1-6, doi: 10.1088/1742-6596/971/1/012037.

S. Al Faraby, E. R. R. Jasin, A. Kusumaningrum, and Adiwijaya, 2018, “Classification Of Hadith Into Positive Suggestion, Negative Suggestion, And Information”, J. Phys. Conf. Ser., pp. 1-8, doi: 10.1088/1742-6596/971/1/012046.

H. M. Abdelaal, B. R. Elemary, and H. A. Youness, 2019, “Classification of Hadith According to Its Content Based on Supervised Learning Algorithms”, IEEE Access, pp. 152379–152387, doi: 10.1109/ACCESS.2019.2948159.

F. Haque, A. H. Orthy, and S. Siddique, 2020, “Hadith Authenticity Prediction using Sentiment Analysis and Machine Learning”, 14th IEEE Int. Conf. Appl. Inf. Commun. Technol. AICT 2020 - Proc., pp. 1-6, doi: 10.1109/AICT50176.2020.9368569.

M. Z. Naf’an, Y. Sari, and Y. Suyanto, 2021, “Word Embeddings Evaluation on Indonesian Translation of AI-Quran and Hadiths”, IOP Conf. Ser. Mater. Sci. Eng., pp. 1-10, doi: 10.1088/1757-899x/1077/1/012025.

H. M. Abdelaal and H. A. Youness, 2019, “Hadith Classification using Machine Learning Techniques According to its Reliability”, Rom. J. Inf. Sci. Technol., pp. 259–271.

F. A. Setiawati, Q. U. Safitri, A. F. Huda, A. Saepulloh, and W. Darmalaksana, 2019, “Feature Selection using k-Medoid Algorithm for Categorization of Hadith Translation in English”, Proceeding 2019 5th Int. Conf. Wirel. Telemat. ICWT 2019, pp. 1-5, doi: 10.1109/ICWT47785.2019.8978221.

B. R. Elemary, 2021, “The Effect of Clustering Classification and Pre-processing of Text on Improving the Accuracy of Hadith,” Sci. J. Financ. Commer. Stud. Res., pp. 549–575, doi: 10.21608/cfdj.2020.129343.

U. Rofiqoh, R. S. Perdana, and M. A. Fauzi, 2017, “Analisis Sentimen Tingkat Kepuasan Pengguna Penyedia Layanan Telekomunikasi Seluler Indonesia Pada Twitter Dengan Metode Support Vector Machine dan Lexion Based Feature”, J. Pengemb. Teknol. Inf. dan Ilmu Komput. Univ. Brawijaya, pp. 1725–1732.

D. Chandrasekaran and V. Mago, 2020, “Evolution of Semantic Similarity - A Survey”, Association for Computer Machinery, pp. 1-37, doi: 10.1145/3440755.

D. Jurafsky and J. H. Martin, 2019, “Vector Semantics and Embeddings”, Speech and Language Processing, pp. 1–31.

T. Mikolov, 2013, “Learning Representations of Text using Neural Networks (Slides)”, NIPS Deep Learn. Work., pp. 1–31.

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, 2017, “Enriching Word Vectors with Subword Information”, Trans. Assoc. Comput. Linguist., pp. 135–146, doi: 10.1162/tacl_a_00051.

A. B. Soliman, K. Eissa, and S. R. El-Beltagy, 2017, “AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP”, Procedia Comput. Sci., pp. 256–265, doi: 10.1016/j.procs.2017.10.117.

J. Turton, D. Vinson, and R. E. Smith, 2020, “Deriving Contextualised Semantic Features from BERT (and Other Transformer Model) Embeddings,” arXiv.

M. Peters et al., 2018, “Deep Contextualized Word Representations”, Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 2227–2237, doi: 10.18653/v1/N18-1202.

J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, K. T. Google, and A. I. Language, 2019, “{BERT}: Pre-training of Deep Bidirectional Transformers for Language Understanding”, Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186, doi: 10.18653/v1/N19-1423.

A. Elnagar, O. Einea, and R. Al-Debsi, 2019, “Automatic Text Tagging of {A}rabic News Articles Using Ensemble Deep Learning Models”, Proc. 3rd Int. Conf. Nat. Lang. Speech Process., pp. 59–66, doi: 10.17632/57zpx667y9.1.

M. Fayaz, A. Khan, J. U. Rahman, A. Alharbi, M. I. Uddin, and B. Alouffi, 2020, “Ensemble machine learning model for classification of spam product reviews”, Complexity, pp. 1-10, doi: 10.1155/2020/8857570.

S. Al-Saqqa, N. Obeid, and A. Awajan, 2019, “Sentiment Analysis for Arabic Text using Ensemble Learning”, Proc. IEEE/ACS Int. Conf. Comput. Syst. Appl. AICCSA, pp. 1–7, doi: 10.1109/AICCSA.2018.8612804.

M. A. Fauzi and A. Yuniarti, 2018, “Ensemble method for indonesian twitter hate speech detection,” Indones. J. Electr. Eng. Comput. Sci., pp. 294–299, doi: 10.11591/ijeecs.v11.i1.pp294-299.

X. Ying, 2019, “An Overview of Overfitting and its Solutions”, J. Phys. Conf. Ser., pp. 1-7, doi: 10.1088/1742-6596/1168/2/022022.

A. Onan, S. Korukoǧlu, and H. Bulut, 2016, “Ensemble of keyword extraction methods and classifiers in text classification”, Expert Syst. Appl., pp. 232–247, doi: 10.1016/j.eswa.2016.03.045.

S. Kumari, D. Kumar, and M. Mittal, 2021, “An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier”, Int. J. Cogn. Comput. Eng., pp. 40–46, doi: 10.1016/j.ijcce.2021.01.001.

Analisis Metode Representasi Teks Untuk Deteksi Interelasi Kitab Hadis: Systematic Literature Review

Abstract

Downloads

References

Most read articles by the same author(s)