Analysis of Stylometric Features and Segmentation Strategies in Intrinsic Plagiarism Detection System

  • Sylvia Putri Gunawan Universitas Kristen Duta Wacana
  • Lucia Dwi Krisnawati Universitas Kristen Duta Wacana
  • Antonius Rachmat Chrismanto UKDW
Keywords: intrinsic plagiarism detection, stylometry features, text segmentation, outlier

Abstract

Two different paradigms in the field of plagiarism detection resulting in External Plagiarism Detection (EPD) and Intrinsic Plagiarism Detection (IPD) systems. The most common applied system is EPD, which requires its algorithm to make a heuristic comparison between a suspicious document with documents in a corpus. In contrast, given a suspicious document only, an algorithm of IPD should be able to find the plagiarism section by looking for text segments having different writing styles. Previous researches for Indonesian texts fell only in the field of the EPD development system. Therefore, this research focuses on and contributes to experimenting and analyzing the stylometric features and segmentation strategies to build an IPD system for Indonesian texts. The experimentation results show that the paragraph segment performs better by scoring 0.92 for Macro Averaged-Accuracy and 0.54 for Macro Averaged-F1. The stylometric features achieving the highest scores of F-1 and Accuracy are the frequency of punctuation, the average paragraph length, and the type-token ratio.

 

Downloads

Download data is not yet available.

References

Halvani, O., 2015. Register & Genre Seminar: Towards Intrinsic Plagiarism Detection, Citeseer, Darmstadt.

A. Rexha, M. Kröll, H. Ziak and R. Kern, 2018. Authorship identification of documents with high content similarity, Scientometrics, vol. 115, p. 223–237

Stamatatos, E., Tschuggnall, M., Verhoeven, B., Daelemans, W., Specht, G., Stein, B., and Potthast, M., 2016. Clustering by Authorship Within and Across Documents, in PAN CLEF 2016 Evaluation Labs and Workshop – Working Notes Papers, Évora, Portugal.

Kuznetsov, M., Motrenko, A., Kuznetsova, R., and Strijov, V., Methods for Intrinsic Plagiarism Detection and Author Diarization. CLEF 2016 Evaluation Labs and Workshop – Working Notes Papers,, Évora, Portugal.

Foltýnek, T., Meuschke, T., and Gipp, B., 2019. Academic Plagiarism Detection: A Systematic Literature Review, ACM Computing Survey, vol. 52, no. 6, pp. 1-42.

Haryanto, N., Krisnawati , L.D., and Chrismanto, A.R., 2020. Temu Kembali Dokumen Sumber Rujukan Dalam Sistem Daur Ulang Teks. Jurnal Teknologi dan Sistem Komputer, vol. 8, no. 2, pp. 140-150.

Chowdhury, H., and Bhattacharya, D., 2016. Plagiarism: Taxonomy, tools and detection techniques. Proceedings of the 19th National Convention on Knowledge, Library and Information Networking (NACLIN’16).

Krisnawati, L.D., 2016. Plagiarism Detection for Indonesian Text. Ph.D. Mumchem:Ludwig-Maximilians-Universität.

Eissen, S., and Stein, B., 2006. Intrinsic Plagiarism Detecion. ECIR 2006, LNCS 3936.

Stamatatos, E., 2009. Intrinsic Plagiarism Detection Using Character n-gram Profiles. SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009), vol. 2, pp. 38-46.

Rahman, R., 2015. Information Theoretical and Statistical Features for Intrinsic Plagiarism Detection. Proceedings of the SIGDIAL 2015 Conference. Prague, czech Republic.

Krause, M., 2015. Stylometry-based Fraud and Plagiarism Detection for Learning at Scale. 5th KSS Workshop. Karlsruhe, Germany.

Elamine, M., Mechti, S., and Belguith, L., 2017. Intrinsic Detection of Plagiarism based on Writing Style Grouping. LPKM2017, Computer Science, Pshychology..

Sunardi, Yudhana, A., and Mukaromah, I., 2017. Perancangan Aplikasi Deteksi Plagiarisme Karya Ilmiah Menggunakan Algoritma Winnowing. Seminar Nasional Serba Informatika. Samarinda, Indonesia.

Bianto, M., Rahayu, I., Huda, M., and Kusrini, 2018. Perancangan Sistem Deteksi Plagiarisme Terhadap Topik Penelitian Menggunakan Metode K-Means Clustering dan Model Bayesian. Seminar Nasional Teknologi Informasi dan Multimedia. Yogyakarta, Indonesia.

Ratna, A., Purnamasari, P., Adhi, B.A., Ekadiyanto, F.A., Salman, M., Mardiyah and Winata, D,J., 2017. Cross-Language Plagiarism Detection System Using Latent Semantic Analysis and Learning Vector Quantization. Algorithms, vol. 10, no. 69, pp. 1-14.

Stein, B., Lipka, N., and Prettenhofer, P., 2011. Intrinsic plagiarism analysis. Language Resources and Evaluation, vol. 45, no. 1, pp. 63-82.

Published
2020-10-30
How to Cite
Sylvia Putri Gunawan, Lucia Dwi Krisnawati, & Chrismanto, A. R. (2020). Analysis of Stylometric Features and Segmentation Strategies in Intrinsic Plagiarism Detection System. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 4(5), 988-997. https://doi.org/10.29207/resti.v4i5.2486
Section
Information Systems Engineering Articles