Naïve Bayes-Support Vector Machine Combined BERT to Classified Big Five Personality on Twitter

  • Billy Anthony Christian Martani Telkom University
  • Erwin Budi Setiawan Telkom University
Keywords: BERT, Big Five Personality, LIWC, Naïve Bayes-Support Vector Machine

Abstract

Twitter is one of the most popular social media used to interact online. Through Twitter, a person's personality can be determined based on that person's thoughts, feelings, and behavior patterns. A person has five main personalities likes Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism. This study will make five personality predictions using the Naïve Bayes method – Support Vector Machine, Synthetic Minority Over Sampling Technique (SMOTE), Linguistic Inquiry Word Count (LIWC), and Bidirectional Encoder from Transformers Representations (BERT). A questionnaire was distributed to people who used Twitter to collect and become a dataset in this research. The dataset obtained will be processed into SMOTE to balance the data. Linguistic Inquiry Word Count is used as a linguistic feature and BERT will be used as a semantic approach. The Naïve Bayes method is used to perform the weighting and the Support Vector Machine is used to classify Big Five Personalities. To help improve accuracy, the Optuna Hyperparameter Tuning method will be added to the Naïve Bayes Support Vector Machine model. This study has an accuracy of 87.82% from the results of combining SMOTE, BERT, LIWC, and Tuning where the accuracy increases from the baseline.

Downloads

Download data is not yet available.

References

Ipqi. 2016. “Teori Kepribadian Model Lima Besar (Big Five Personality)”, https://ipqi.org/category/management-article/hr-productivity/, (accessed Nov. 02, 2021)

adm1. 2018. “Apa Beda Karakter, Kepribadian, Sifat, dan Temperamen?”, https://seputargk.id/apa-beda-karakter-kepribadian-sifat-dan-temperamen/, (accessed Nov. 02, 2021)

Roige, S. S., Gray, J. C., Mackillop, J. K., Chen, C.-H., & Palmer, A. A. (2014). The genetics of human personality. The Laryngoscope, 2, 2–31.

Riyanto, Andi Dwi. “Hootsuite (We are Social): Indonesian Digital Report 2021”, https://andi.link/hootsuite-we-are-social-indonesian-digital-report-2021/ (accessed Nov. 02, 2021)

D., Kustin Ayuwuragil. “Twitter”, https://m.merdeka.com/twitter/profil/, (accessed Nov. 02, 2021)

Samsir, Kusmanto, Abdul Hakim Dalimunthe, Rahmad Aditiya, and Ronal Watrianthos, “Implementation Naïve Bayes Classification for Sentiment Analysis on Internet Movie Database,” Building of Informatics, Technology and Science (BITS), vol. 4, no. 1, pp. 1–6, Jun. 2022.

Willy, Setiawan, E. B., & Nugraha, F. N. (2019). Implementation of Decision Tree C4.5 for Big Five Personality Predictions with TF-RF and TF-CHI2 on social media Twitter. 2019 International Conference on Computer, Control, Informatics, and Its Applications: Emerging Trends in Big Data and Artificial Intelligence, IC3INA 2019, October 2019, 114–119. https://doi.org/10.1109/IC3INA48034.2019.8949601

Salsabila, G. D., & Setiawan, E. B. (2021). Semantic Approach for Big Five Personality Prediction on Twitter. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(4), 680–687. https://doi.org/10.29207/resti.v5i4.3197

R. Watrianthos, M. Giatman, W. Simatupang, R. Syafriyeti, and N. K. Daulay, “Analisis Sentimen Pembelajaran Campuran Pada Twitter Data Menggunakan Algoritma Naïve Bayes,” Analisis Sentimen Pembelajaran Campuran Pada Twitter Data Menggunakan Algoritma Naïve Bayes, vol. 6, no. 1, pp. 166–170, 2022, doi: http://dx.doi.org/10.30865/mib.v6i1.3383.

Gita Safitri, & Erwin Budi Setiawan. (2022). Optimization Prediction of Big Five Personality in Twitter Users. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(1), 85–91. https://doi.org/10.29207/resti.v6i1.3529

Zain, F. F., & Sibaroni, Y. (2019). Effectiveness of SVM Method by Naïve Bayes Weighting in Movie Review Classification. Khazanah Informatika: Jurnal Ilmu Komputer Dan Informatika, 5(2), 108–114. https://doi.org/10.23917/khif.v5i2.7770

Sonia, Roccas. Lilach, Sagiv. H., Schwartz Shalom. Ariel, Knafo. 2002. “The Big Five Personality Factors and Personal Values”. SAGE.

Eka Sembodo, J., Budi Setiawan, E., & Abdurahman Baizal, Z. (2016). Data Crawling Otomatis pada Twitter. September 11–16. https://doi.org/10.21108/indosc.2016.11

M. Bobbi, K. Nasution, S. Suryadi, and R. Watrianthos, “Model Pengenalan Suara Teks Bebas Menggunakan Algoritma Support Vector Machine,” Jurnal Media Informatika Budidarma, vol. 4, no. 4, pp. 1249–1255, 2020, doi: 10.30865/mib.v4i4.2436.

Alam, S., & Yao, N. (2019). The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis. Computational and Mathematical Organization Theory, 25(3), 319–335. https://doi.org/10.1007/s10588-018-9266-8

Elreedy, D., & Atiya, A. F. (2019). A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance. Information Sciences, 505, 32–64. https://doi.org/10.1016/j.ins.2019.07.070

Samsir, Ambiyar, U. Verawardina, F. Edi, and R. Watrianthos, “Analisis Sentimen Pembelajaran Daring Pada Twitter di Masa Pandemi COVID-19,” JURNAL MEDIA INFORMATIKA BUDIDARMAJURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 5, no. 10, pp. 174–179, 2021, doi: 10.30865/mib.v4i4.2293.

Wallace, B. C., Small, K., Brodley, C. E., & Trikalinos, T. A. (2011). Class imbalance, redux. Proceedings - IEEE International Conference on Data Mining, ICDM, 754–763. https://doi.org/10.1109/ICDM.2011.33

Goncalves, P., Fabrício, B., Matheus, A., & Meeyoung, C. (2013). Comparing and Combining Sentiment Analysis Methods Categories and Subject Descriptors. Proceedings of the First ACM Conference on Online Social Networks, 27–38.

Ilzam Nur Haq, F., & Budi, E. (2019). Implementasi Naive Bayes Classifier untuk Prediksi Kepribadian Big Five pada Twitter Menggunakan Term Frequency-Inverse Document Frequency (TF-IDF ) dan Term Frequency-Relevance Frequency ( TF-RF ) Program Studi Sarjana Ilmu Komputasi Fakultas Informatik. E-Proceeding of Engineering, 6(2), 9785–9795.

Dong, Y., Liu, P., Zhu, Z., Wang, Q., & Zhang, Q. (2020). A Fusion Model-Based Label Embedding and Self-Interaction Attention for Text Classification. IEEE Access, 8, 30548–30559. https://doi.org/10.1109/ACCESS.2019.2954985

Tong, S., & Koller, D. (2009). Support Vector Machine Active Learning with Applications to Text Classfication. American Quarterly, 61(2), 417–421. https://doi.org/10.1353/aq.0.0077

Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: A Next-generation Hyperparameter Optimization Framework. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2623–2631. https://doi.org/10.1145/3292500.3330701

Deng, X., Liu, Q., Deng, Y., & Mahadevan, S. (2016). An improved method to construct basic probability assignment based on the confusion matrix for classification problem. Information Sciences, 340–341, 250–261. https://doi.org/10.1016/j.ins.2016.01.033

Published
2022-12-30
How to Cite
Billy Anthony Christian Martani, & Setiawan, E. B. (2022). Naïve Bayes-Support Vector Machine Combined BERT to Classified Big Five Personality on Twitter. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(6), 1072 - 1078. https://doi.org/10.29207/resti.v6i6.4378
Section
Artikel Teknologi Informasi