Improving Document Retrieval with Spelling Correction for Weak and Fabricated Indonesian-Translated Hadith
Meningkatkan Pengambilan Dokumen dengan Koreksi Ejaan untuk Hadits Lemah dan Fabrikasi Indonesia yang Lemah
Abstract
Hadith has several levels of authenticity, among which are weak (dhaif), and fabricated (maudhu) hadith that may not originate from the prophet Muhammad PBUH, and thus should not be considered in concluding an Islamic law (sharia). However, many such hadiths have been commonly confused as authentic hadiths among ordinary Muslims. To easily distinguish such hadiths, this paper proposes a method to check the authenticity of a hadith by comparing them with a collection of fabricated hadiths in Indonesian. The proposed method applies the vector space model and also performs spelling correction using symspell to check whether the use of spelling check can improve the accuracy of hadith retrieval, because it has never been done in previous works and typos are common on Indonesian-translated hadiths on the Web and social media raw text. The experiment result shows that the use of spell checking improves the mean average precision and recall to become 81% (from 73%) and 89% (from 80%), respectively. Therefore, the improvement in accuracy by implementing spelling correction make the hadith retrieval system more feasible and encouraged to be implemented in future works because it can correct typos that are common in the raw text on the Internet.
Downloads
References
R. Aslamiah, “Hadis Maudhu dan Akibatnya,” Al-Hiwar J. Ilmu dan Tek. Dakwah, vol. 4, no. 6, pp. 24–34, 2017, doi: 10.18592/al-hiwar.v4i6.1214.
A. G. Fawwaz, “The Fabrication of Hadith,” University of Jordan, 2018.
A. H. Usman and R. Wazir, “The Fabricated Hadith: Islamic Ethics and Guidelines of Hadith Dispersion in Social Media,” Turkish Online J. Des. Art Commun., vol. Special Ed, pp. 804–808, 2018, doi: 10.7456/1080sse/114.
M. N. Al-Albani, Silsilah hadits dha’if dan maudhu’. Jakarta: Gema Insani Press, 2005.
E. Atwell, C. Brierley, K. Dukes, M. Sawalha, and A. Sharaf, “An Artificial Intelligence approach to Arabic and Islamic content on the internet,” in Proc NITS’2011 National Information Technology Symposium, King Saud University, Saudi Arabia. Data protection statements, 2011, no. 1, pp. 1–8, doi: 10.13140/2.1.2425.9528.
V. K. Singh and V. K. Singh, “Vector Space Model: an Information Retrieval System,” in Proceedings of BITCON-2015 Innovations For National Development, 2015, pp. 141–143.
H. M. Hanum, Z. A. Bakar, N. A. Rahman, M. M. Rosli, and N. Musa, “Using Topic Analysis for Querying Halal Information on Malay Documents,” Procedia - Soc. Behav. Sci., vol. 121, no. 19, pp. 214–222, 2014, doi: 10.1016/j.sbspro.2014.01.1122.
K. Jbara, “Knowledge Discovery in Al-Hadith Using Text Classification Algorithm,” J. Am. Sci., vol. 6, no. 11, pp. 409–419, 2010.
I. Humaini, L. Wulandari, D. Ikasari, and T. Yusnitasari, “Penerapan Algoritma Tf-Idf Vector Space Model (Vsm) Pada Information Retrieval Terjemahan Al Quran Surat 1 Samai Dengan Surat 16 Berdasarkan Kesamaan Makna Implementation of TF-IDF Vector Space Model (VSM) Algorithm in Information Retrieval of AL QURA,” 2019, pp. 525–534.
M. A. Saloot, N. Idris, R. Mahmud, S. Ja’afar, D. Thorleuchter, and A. Gani, “Hadith data mining and classification: a comparative analysis,” Artif. Intell. Rev., vol. 46, no. 1, pp. 113–128, 2016, doi: 10.1007/s10462-016-9458-x.
W. Garbe, “1000X Faster Spelling Correction.” 2017, Accessed: Feb. 16, 2020. [Online]. Available: https://towardsdatascience.com/symspellcompound-10ec8f467c9b.
A. Sabiq, Hadits lemah dan palsu yang popular di Indonesia. Gresik: Pustaka Al-Furqan, 2009.
S. Vijayarani, J. Ilamathi, and M. Nithya, “Preprocessing Techniques for Text Mining,” Int. J. Comput. Sci. Commun. Networks, vol. 5, no. 1, pp. 7–16, 2015.
V. Christanti Mawardi, N. Susanto, and D. Santun Naga, “Spelling Correction for Text Documents in Bahasa Indonesia Using Finite State Automata and Levinshtein Distance Method,” in MATEC Web of Conferences, 2018, pp. 1–16, doi: 10.1051/matecconf/201816401047.
T. Sabbah et al., “Modified frequency-based term weighting schemes for text classification,” Appl. Soft Comput. J., vol. 58, pp. 193–206, 2017, doi: 10.1016/j.asoc.2017.04.069.
A. Aziz, R. Saptono, and K. P. Suryajaya, “Implementasi Vector Space Model dalam Pembangkitan Frequently Asked Questions Otomatis dan Solusi yang Relevan untuk Keluhan Pelanggan,” Sci. J. Informatics, vol. 2, no. 2, pp. 111–121, 2015, doi: 10.15294/sji.v2i2.5076.
Y. Rochmawati and R. Kusumaningrum, “Studi Perbandingan Algoritma Pencarian String dalam Metode Approximate String Matching untuk Identifikasi Kesalahan Pengetikan Teks,” J. Buana Inform., vol. 7, no. 2, pp. 125–134, 2016, doi: 10.24002/jbi.v7i2.491.
I. R. Ponilan, Adiwijaya, M. A. Bijaksana, and A. S. Raharusun, “Search relevant retrieval on indonesian translation hadith document using query expansion and smoothing probabilistic model,” in Journal of Physics: Conference Series, 2019, pp. 1–12, doi: 10.1088/1742-6596/1192/1/012032.
Copyright (c) 2020 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;