Deteksi Bot Spammer Pada Twitter Menggunakan Smith Waterman Similarity Dan Time Interval Entropy
Abstract
Twitter is a social media that interacts through 140-character text-based tweet posts including photos, videos and hyperlinks. Spam tweets contain harmful messages sent continuously. Besides disturbing it is also dangerous for the recipient, exacerbated by the use of bots that automatically and quickly spread spam messages that can cause data damage. This study aims to detect spam bots by utilizing the similarity of tweets using Smith Waterman and the posting time interval. Data tweets are collected using scrap libraries in python in the form of id, text, time, link, based on datasets labeled as available. The data is carried out by text preprocessing steps to clean the text and then do the calculations. The calculation results of both the similarity method and the post time interval are then classified with k-Neaset Neighbor with the previous dataset that has been labeled to get the spam or legitimate bot prediction results. The results of classification experiments with several combinations of k to detect spam bots with similarity criteria and entropy interval obtained the best results k = 3 Neirest Neighbor and 10 fold Cross Validation with a predictive value of detection accuracy of 80%, 84% precission and 84% recall.
Downloads
References
[2] D. P. Christian Sri Kusuma Aditya., Mamluatul Hani’ah., Alif Akbar Fitrawan., Agus Zainal Arifin., “Deteksi Bot Spammer pada Twitter Berbasis Sentiment Analysis dan Time Interval Entropy,” J. Buana Inform., vol. 7, 2016.
[3] S. J. Zi Chu, Steven Gianvecchio, Haining Wang, “Who is Tweeting on Twitter: Human, Bot, or Cyborg?,” in Proceedings of the 26th Annual Computer Security Applications Conference, 2010, pp. 21–30.
[4] H. L. Fred Morstatter., Liang Wu., Tahora H. Nazer., Kathleen M. Carley., “A New Approach to Bot Detection: Striking the Balance Between Precision and Recall,” IEEE, 2016.
[5] Twitter, “Twitter,” Twitter. .
[6] Hongyu Gao., Jun Hu., Christo Wilson., Zhichun Li., Yan Chen., Ben Y. Zhao., “Detecting and Characterizing Social Spam Campaigns,” ACM, 2010.
[7] R. S. Perdana, T. H. Muliawati, and R. Alexandro, “Bot Spammer Detection in Twitter Using Tweet Similarity and Time Interval Entropy,” J. Ilmu Komput. dan Inf., vol. 8, no. 1, p. 19, 2015.
[8] Mahdi Washha., Aziz Qaroush., Florence Sedes., “Leveraging Time for Spammers Detection on Twitter,” ACM, 2016.
[9] Vincentius Riandaru Prasetyo., Edi Winarko., “Rating Of Indonesian Sinetron Based On Public Opinion In Twitter Using Cosine Similarity,” IEEE, 2016.
[10] R. I. Abdul Munif., Rizky Januar Akbar., Ruchi Intan Tantra., “Rancang Bangun Sistem E-Learning Pemrograman Pada Modul Deteksi Plagiarisme Kode Program Dan Student Feedback System,” J. Ilm. Teknol. Inf., vol. 15, 2017.
[11] A. R. Radiant Victor Imbar., Adelia., Mewati Ayub., “Implementasi Cosine Similarity dan Algoritma Smith-Waterman untuk Mendeteksi Kemiripan Teks,” J. Inform., vol. 10, 2015.
[12] Gotoh O, “An Improved Algorithm For Matching Biological Sequences,” J. Mol. Biol, vol. 162, 1982.
[13] Smith T.F., Waterman M.S., “Identification Of Common Molecular Subsequencees,” J. Mol. Biol, vol. 147, 1981.
Copyright (c) 2018 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;