Analysis of Stylometric Features and Segmentation Strategies in Intrinsic Plagiarism Detection System

  • Sylvia Putri Gunawan Universitas Kristen Duta Wacana
  • Lucia Dwi Krisnawati Universitas Kristen Duta Wacana
  • Antonius Rachmat Chrismanto UKDW
Keywords: intrinsic plagiarism detection, stylometry features, text segmentation, outlier


Two different paradigms in the field of plagiarism detection resulting in External Plagiarism Detection (EPD) and Intrinsic Plagiarism Detection (IPD) systems. The most common applied system is EPD, which requires its algorithm to make a heuristic comparison between a suspicious document with documents in a corpus. In contrast, given a suspicious document only, an algorithm of IPD should be able to find the plagiarism section by looking for text segments having different writing styles. Previous researches for Indonesian texts fell only in the field of the EPD development system. Therefore, this research focuses on and contributes to experimenting and analyzing the stylometric features and segmentation strategies to build an IPD system for Indonesian texts. The experimentation results show that the paragraph segment performs better by scoring 0.92 for Macro Averaged-Accuracy and 0.54 for Macro Averaged-F1. The stylometric features achieving the highest scores of F-1 and Accuracy are the frequency of punctuation, the average paragraph length, and the type-token ratio.



Download data is not yet available.


Artikel Rekayasa Sistem Informasi