Prediction of Retweets Based on User, Content, and Time Features Using EUSBoost
Abstract
Twitter is one of the popular microblogs that allow users to write posts. Retweeting is one of the mechanisms for the diffusion of information on Twitter. One way to understand the spread of information is to learn about retweet predictions. This study focuses on predicting retweets using Evolutionary Undersampling Boosting (EUSBoost) based on user, content, and time-based features. We also consider the vector of text as a predictive feature. Models with EUSBoost are able to outperform models using the AdaBoost method. The evaluation results show that the best model can achieve an AUC performance score of 77.21% and a GM score of 77.18%. While the Adaboost-based models achieved AUC scores ranging from 68% to 69% and GM scores ranging from 62% to 63%. In addition, we found that there was no significant difference between using numeric features only and combining numeric and text features.
Downloads
References
B. Arafah and M. Hasyim, “Social Media as a Gateway to Information: Digital Literacy on Current Issues in Social Media,” Webology, vol. 19, no. 1, pp. 2491–2503, Jan. 2022, doi: 10.14704/web/v19i1/web19167.
P. Kumar and A. Sinha, “Information diffusion modeling and analysis for socially interacting networks,” Social Network Analysis and Mining, vol. 11, no. 1, Dec. 2021, doi: 10.1007/s13278-020-00719-7.
Y. K. Dwivedi et al., “Setting the future of digital and social media marketing research: Perspectives and research propositions,” International Journal of Information Management, vol. 59, Aug. 2021, doi: 10.1016/j.ijinfomgt.2020.102168.
I. Daga, A. Gupta, R. Vardhan, and P. Mukherjee, “Prediction of likes and retweets using text information retrieval,” in Procedia Computer Science, 2020, vol. 168, pp. 123–128. doi: 10.1016/j.procs.2020.02.273.
Y. Özkent, “Social media usage to share information in communication journals: An analysis of social media activity and article citations,” PLoS ONE, vol. 17, no. 2 February, Feb. 2022, doi: 10.1371/journal.pone.0263725.
S. N. Firdaus, C. Ding, and A. Sadeghian, “Retweet: A popular information diffusion mechanism – A survey paper,” Online Social Networks and Media, vol. 6, pp. 26–40, Jun. 2018, doi: 10.1016/j.osnem.2018.04.001.
R. Watrianthos, M. Giatman, W. Simatupang, R. Syafriyeti, and N. K. Daulay, “Analisis Sentimen Pembelajaran Campuran Pada Twitter Data Menggunakan Algoritma Naïve Bayes,” Analisis Sentimen Pembelajaran Campuran Pada Twitter Data Menggunakan Algoritma Naïve Bayes, vol. 6, no. 1, pp. 166–170, 2022, doi: http://dx.doi.org/10.30865/mib.v6i1.3383
S. N. Firdaus, C. Ding, and A. Sadeghian, “Retweet Prediction based on Topic, Emotion and Personality,” Online Social Networks and Media, vol. 25, Sep. 2021, doi: 10.1016/j.osnem.2021.100165.
H. Bunyamin and T. Tunys, “A Comparison of Retweet Prediction Approaches: The Superiority of Random Forest Learning Method,” TELKOMNIKA (Telecommunication Computing Electronics and Control), vol. 14, no. 3, p. 1052, Sep. 2016, doi: 10.12928/telkomnika.v14i3.3150.
T. B. N. Hoang and J. Mothe, “Predicting information diffusion on Twitter – Analysis of predictive features,” Journal of Computational Science, vol. 28, pp. 257–264, Sep. 2018, doi: 10.1016/j.jocs.2017.10.010.
Z. Akbar, J. Liu, and Z. Latif, “Mining social applications network from business perspective using modularity maximization for community detection,” Social Network Analysis and Mining, vol. 11, no. 1, p. 115, 2021, doi: 10.1007/s13278-021-00798-0.
Samsir, Ambiyar, U. Verawardina, F. Edi, and R. Watrianthos, “Analisis Sentimen Pembelajaran Daring Pada Twitter di Masa Pandemi COVID-19,” JURNAL MEDIA INFORMATIKA BUDIDARMAJURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 5, no. 10, pp. 174–179, 2021, doi: 10.30865/mib.v4i4.2293
M. G. Silva, M. A. Domínguez, and P. G. Celayes, “Analyzing the retweeting behavior of influencers to predict popular tweets, with and without considering their content,” in Communications in Computer and Information Science, 2019, vol. 898, pp. 75–90. doi: 10.1007/978-3-030-11680-4_9.
M. Wang, W. Zuo, and Y. Wang, “A multidimensional nonnegative matrix factorization model for retweeting behavior prediction,” Mathematical Problems in Engineering, vol. 2015, 2015, doi: 10.1155/2015/936397.
K. Lytvyniuk, R. Sharma, and A. Jurek-Loughrey, “Predicting Information Diffusion in Online Social Platforms: A Twitter Case Study,” in Studies in Computational Intelligence, 2019, vol. 812, pp. 405–417. doi: 10.1007/978-3-030-05411-3_33.
D. Kim, D. Seo, S. Cho, and P. Kang, “Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec,” Information Sciences, vol. 477, pp. 15–29, Mar. 2019, doi: 10.1016/j.ins.2018.10.006.
M. Galar, A. Fernández, E. Barrenechea, and F. Herrera, “EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling,” Pattern Recognition, vol. 46, no. 12, pp. 3460–3471, Dec. 2013, doi: 10.1016/j.patcog.2013.05.006.
B. Krawczyk, M. Galar, Ł. Jeleń, and F. Herrera, “Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy,” Applied Soft Computing Journal, vol. 38, pp. 714–726, Jan. 2016, doi: 10.1016/j.asoc.2015.08.060.
D. Ameta, “Ensemble Classifier Approach in Breast Cancer Detection and Malignancy Grading - A Review,” International Journal of Managing Public Sector Information and Communication Technologies, vol. 8, no. 1, pp. 17–26, Mar. 2017, doi: 10.5121/ijmpict.2017.8102.
W. Lee and K. Seo, “Downsampling for Binary Classification with a Highly Imbalanced Dataset Using Active Learning,” Big Data Research, vol. 28, p. 100314, May 2022, doi: 10.1016/j.bdr.2022.100314.
J. Zhao, J. Jin, S. Chen, R. Zhang, B. Yu, and Q. Liu, “A weighted hybrid ensemble method for classifying imbalanced data,” Knowledge-Based Systems, vol. 203, Sep. 2020, doi: 10.1016/j.knosys.2020.106087.
Copyright (c) 2022 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;