Deteksi Bot Spammer Pada Twitter Menggunakan Smith Waterman Similarity Dan Time Interval Entropy

  • Imam Syafii Mahasiswa
  • Arief Setyanto Universitas Amikom Yogyakarta
  • Suwanto Raharjo Institut Sains & Teknologi AKPRIND Yogyakarta
Keywords: spammers detection on twitter, time interval entropy, smith waterman similarity

Abstract

Twitter is a social media that interacts through 140-character text-based tweet posts including photos, videos and hyperlinks. Spam tweets contain harmful messages sent continuously. Besides disturbing it is also dangerous for the recipient, exacerbated by the use of bots that automatically and quickly spread spam messages that can cause data damage. This study aims to detect spam bots by utilizing the similarity of tweets using Smith Waterman and the posting time interval. Data tweets are collected using scrap libraries in python in the form of id, text, time, link, based on datasets labeled as available. The data is carried out by text preprocessing steps to clean the text and then do the calculations. The calculation results of both the similarity method and the post time interval are then classified with k-Neaset Neighbor with the previous dataset that has been labeled to get the spam or legitimate bot prediction results. The results of classification experiments with several combinations of k to detect spam bots with similarity criteria and entropy interval obtained the best results k = 3 Neirest Neighbor and 10 fold Cross Validation with a predictive value of detection accuracy of 80%, 84% precission and 84% recall.

References

[1] W. Hidayat, “Kementrian Komunikasi dan Informatika Republik Indonesia,” 2017. [Online]. Available: https://kominfo.go.id/content/detail/4286/pengguna-internet-indonesia-nomor-enam-dunia/0/sorotan_media.
[2] D. P. Christian Sri Kusuma Aditya., Mamluatul Hani’ah., Alif Akbar Fitrawan., Agus Zainal Arifin., “Deteksi Bot Spammer pada Twitter Berbasis Sentiment Analysis dan Time Interval Entropy,” J. Buana Inform., vol. 7, 2016.
[3] S. J. Zi Chu, Steven Gianvecchio, Haining Wang, “Who is Tweeting on Twitter: Human, Bot, or Cyborg?,” in Proceedings of the 26th Annual Computer Security Applications Conference, 2010, pp. 21–30.
[4] H. L. Fred Morstatter., Liang Wu., Tahora H. Nazer., Kathleen M. Carley., “A New Approach to Bot Detection: Striking the Balance Between Precision and Recall,” IEEE, 2016.
[5] Twitter, “Twitter,” Twitter. .
[6] Hongyu Gao., Jun Hu., Christo Wilson., Zhichun Li., Yan Chen., Ben Y. Zhao., “Detecting and Characterizing Social Spam Campaigns,” ACM, 2010.
[7] R. S. Perdana, T. H. Muliawati, and R. Alexandro, “Bot Spammer Detection in Twitter Using Tweet Similarity and Time Interval Entropy,” J. Ilmu Komput. dan Inf., vol. 8, no. 1, p. 19, 2015.
[8] Mahdi Washha., Aziz Qaroush., Florence Sedes., “Leveraging Time for Spammers Detection on Twitter,” ACM, 2016.
[9] Vincentius Riandaru Prasetyo., Edi Winarko., “Rating Of Indonesian Sinetron Based On Public Opinion In Twitter Using Cosine Similarity,” IEEE, 2016.
[10] R. I. Abdul Munif., Rizky Januar Akbar., Ruchi Intan Tantra., “Rancang Bangun Sistem E-Learning Pemrograman Pada Modul Deteksi Plagiarisme Kode Program Dan Student Feedback System,” J. Ilm. Teknol. Inf., vol. 15, 2017.
[11] A. R. Radiant Victor Imbar., Adelia., Mewati Ayub., “Implementasi Cosine Similarity dan Algoritma Smith-Waterman untuk Mendeteksi Kemiripan Teks,” J. Inform., vol. 10, 2015.
[12] Gotoh O, “An Improved Algorithm For Matching Biological Sequences,” J. Mol. Biol, vol. 162, 1982.
[13] Smith T.F., Waterman M.S., “Identification Of Common Molecular Subsequencees,” J. Mol. Biol, vol. 147, 1981.
Published
2018-12-12
Section
Technology Information Article