Naïve Bayes-Support Vector Machine Combined BERT to Classified Big Five Personality on Twitter
Abstract
Twitter is one of the most popular social media used to interact online. Through Twitter, a person's personality can be determined based on that person's thoughts, feelings, and behavior patterns. A person has five main personalities likes Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. This study will make five personality predictions using the Naïve Bayes method – Support Vector Machine, Synthetic Minority Over Sampling Technique (SMOTE), Linguistic Inquiry Word Count (LIWC), and Bidirectional Encoder from Transformers Representations (BERT). A questionnaire was distributed to people who used Twitter to collect and become a dataset in this research. The dataset obtained will be processed into SMOTE to balance the data. Linguistic Inquiry Word Count is used as a linguistic feature and BERT will be used as a semantic approach. The Naïve Bayes method is used to perform the weighting and the Support Vector Machine is used to classify Big Five Personalities. To help improve accuracy, the Optuna Hyperparameter Tuning method will be added to the Naïve Bayes Support Vector Machine model. This study has an accuracy of 87.82% from the results of combining SMOTE, BERT, LIWC, and Tuning where the accuracy increases from the baseline.
Downloads
References
Ipqi. 2016. “Teori Kepribadian Model Lima Besar (Big Five Personality)”, https://ipqi.org/category/management-article/hr-productivity/, (accessed Nov. 02, 2021)
adm1. 2018. “Apa Beda Karakter, Kepribadian, Sifat, dan Temperamen?”, https://seputargk.id/apa-beda-karakter-kepribadian-sifat-dan-temperamen/, (accessed Nov. 02, 2021)
Roige, S. S., Gray, J. C., Mackillop, J. K., Chen, C.-H., & Palmer, A. A. (2014). The genetics of human personality. The Laryngoscope, 2, 2–31.
Riyanto, Andi Dwi. “Hootsuite (We are Social): Indonesian Digital Report 2021”, https://andi.link/hootsuite-we-are-social-indonesian-digital-report-2021/ (accessed Nov. 02, 2021)
D., Kustin Ayuwuragil. “Twitter”, https://m.merdeka.com/twitter/profil/, (accessed Nov. 02, 2021)
Samsir, Kusmanto, Abdul Hakim Dalimunthe, Rahmad Aditiya, and Ronal Watrianthos, “Implementation Naïve Bayes Classification for Sentiment Analysis on Internet Movie Database,” Building of Informatics, Technology and Science (BITS), vol. 4, no. 1, pp. 1–6, Jun. 2022.
Willy, Setiawan, E. B., & Nugraha, F. N. (2019). Implementation of Decision Tree C4.5 for Big Five Personality Predictions with TF-RF and TF-CHI2 on social media Twitter. 2019 International Conference on Computer, Control, Informatics, and Its Applications: Emerging Trends in Big Data and Artificial Intelligence, IC3INA 2019, October 2019, 114–119. https://doi.org/10.1109/IC3INA48034.2019.8949601
Salsabila, G. D., & Setiawan, E. B. (2021). Semantic Approach for Big Five Personality Prediction on Twitter. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(4), 680–687. https://doi.org/10.29207/resti.v5i4.3197
R. Watrianthos, M. Giatman, W. Simatupang, R. Syafriyeti, and N. K. Daulay, “Analisis Sentimen Pembelajaran Campuran Pada Twitter Data Menggunakan Algoritma Naïve Bayes,” Analisis Sentimen Pembelajaran Campuran Pada Twitter Data Menggunakan Algoritma Naïve Bayes, vol. 6, no. 1, pp. 166–170, 2022, doi: http://dx.doi.org/10.30865/mib.v6i1.3383.
Gita Safitri, & Erwin Budi Setiawan. (2022). Optimization Prediction of Big Five Personality in Twitter Users. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(1), 85–91. https://doi.org/10.29207/resti.v6i1.3529
Zain, F. F., & Sibaroni, Y. (2019). Effectiveness of SVM Method by Naïve Bayes Weighting in Movie Review Classification. Khazanah Informatika: Jurnal Ilmu Komputer Dan Informatika, 5(2), 108–114. https://doi.org/10.23917/khif.v5i2.7770
Sonia, Roccas. Lilach, Sagiv. H., Schwartz Shalom. Ariel, Knafo. 2002. “The Big Five Personality Factors and Personal Values”. SAGE.
Eka Sembodo, J., Budi Setiawan, E., & Abdurahman Baizal, Z. (2016). Data Crawling Otomatis pada Twitter. September 11–16. https://doi.org/10.21108/indosc.2016.11
M. Bobbi, K. Nasution, S. Suryadi, and R. Watrianthos, “Model Pengenalan Suara Teks Bebas Menggunakan Algoritma Support Vector Machine,” Jurnal Media Informatika Budidarma, vol. 4, no. 4, pp. 1249–1255, 2020, doi: 10.30865/mib.v4i4.2436.
Alam, S., & Yao, N. (2019). The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis. Computational and Mathematical Organization Theory, 25(3), 319–335. https://doi.org/10.1007/s10588-018-9266-8
Elreedy, D., & Atiya, A. F. (2019). A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance. Information Sciences, 505, 32–64. https://doi.org/10.1016/j.ins.2019.07.070
Samsir, Ambiyar, U. Verawardina, F. Edi, and R. Watrianthos, “Analisis Sentimen Pembelajaran Daring Pada Twitter di Masa Pandemi COVID-19,” JURNAL MEDIA INFORMATIKA BUDIDARMAJURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 5, no. 10, pp. 174–179, 2021, doi: 10.30865/mib.v4i4.2293.
Wallace, B. C., Small, K., Brodley, C. E., & Trikalinos, T. A. (2011). Class imbalance, redux. Proceedings - IEEE International Conference on Data Mining, ICDM, 754–763. https://doi.org/10.1109/ICDM.2011.33
Goncalves, P., Fabrício, B., Matheus, A., & Meeyoung, C. (2013). Comparing and Combining Sentiment Analysis Methods Categories and Subject Descriptors. Proceedings of the First ACM Conference on Online Social Networks, 27–38.
Ilzam Nur Haq, F., & Budi, E. (2019). Implementasi Naive Bayes Classifier untuk Prediksi Kepribadian Big Five pada Twitter Menggunakan Term Frequency-Inverse Document Frequency (TF-IDF ) dan Term Frequency-Relevance Frequency ( TF-RF ) Program Studi Sarjana Ilmu Komputasi Fakultas Informatik. E-Proceeding of Engineering, 6(2), 9785–9795.
Dong, Y., Liu, P., Zhu, Z., Wang, Q., & Zhang, Q. (2020). A Fusion Model-Based Label Embedding and Self-Interaction Attention for Text Classification. IEEE Access, 8, 30548–30559. https://doi.org/10.1109/ACCESS.2019.2954985
Tong, S., & Koller, D. (2009). Support Vector Machine Active Learning with Applications to Text Classfication. American Quarterly, 61(2), 417–421. https://doi.org/10.1353/aq.0.0077
Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: A Next-generation Hyperparameter Optimization Framework. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2623–2631. https://doi.org/10.1145/3292500.3330701
Deng, X., Liu, Q., Deng, Y., & Mahadevan, S. (2016). An improved method to construct basic probability assignment based on the confusion matrix for classification problem. Information Sciences, 340–341, 250–261. https://doi.org/10.1016/j.ins.2016.01.033
Copyright (c) 2022 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;