Semantic Approach for Big Five Personality Prediction on Twitter
Abstract
Personality provides a deep insight of someone and has an important part in someone’s job performance. Predicting personality through social media has been studied on several research. The problem is how to improve the performance of personality prediction system. The purpose of this research is to predict personality on Twitter users and increase the performance of the personality prediction system. An online survey using Big Five Inventory (BFI) questionnaire has been distributed and gathered 295 Twitter users with 511,617 tweets data. In this research, we experiment on two different methods using Support Vector Machine (SVM), and the combination of SVM and BERT as the semantic approach. This research also implements Linguistic Inquiry Word Count (LIWC) as the linguistic feature for personality prediction system. The results showed that combination of these two methods achieve 79.35% accuracy score and with the implementation of LIWC can improve the accuracy score up to 80.07%. Overall, these results showed that the combination of SVM and BERT as the semantic approach with the implementation of LIWC is recommended to gain a better performance for the personality prediction system.
Downloads
References
M. A. Rahman, A. Al Faisal, T. Khanam, M. Amjad, and M. S. Siddik, “Personality Detection from Text using Convolutional Neural Network,” 1st Int. Conf. Adv. Sci. Eng. Robot. Technol. 2019, ICASERT 2019, vol. 2019, no. Icasert, pp. 1–6, 2019, doi: 10.1109/ICASERT.2019.8934548.
R. Moraes, L. L. Pinto, M. Pilankar, and P. Rane, “Personality Assessment Using Social Media for Hiring Candidates,” 2020 3rd Int. Conf. Commun. Syst. Comput. IT Appl. CSCITA 2020 - Proc., pp. 192–197, 2020, doi: 10.1109/CSCITA47329.2020.9137818.
F. Celli and B. Lepri, “Is big five better than MBTI? A personality computing challenge using Twitter data,” CEUR Workshop Proc., vol. 2253, 2018.
M. Vaidhya, B. Shrestha, B. Sainju, K. Khaniya, and A. Shakya, “Personality Traits Analysis from Facebook Data,” ICSEC 2017 - 21st Int. Comput. Sci. Eng. Conf. 2017, Proceeding, vol. 6, pp. 153–156, 2018, doi: 10.1109/ICSEC.2017.8443932.
C. Li, J. Wan, and B. Wang, “Personality Prediction of Social Network Users,” Proc. - 2017 16th Int. Symp. Distrib. Comput. Appl. to Business, Eng. Sci. DCABES 2017, vol. 2018-Septe, pp. 84–87, 2017, doi: 10.1109/DCABES.2017.25.
F. Ilzam Nur Haq and E. Budi, “Implementasi Naive Bayes Classifier untuk Prediksi Kepribadian Big Five pada Twitter Menggunakan Term Frequency-Inverse Document Frequency ( TF-IDF ) dan Term Frequency-Relevance Frequency ( TF-RF ) Program Studi Sarjana Ilmu Komputasi Fakultas Informatik,” e-Proceeding Eng., vol. 6, no. 2, pp. 9785–9795, 2019.
Willy, E. B. Setiawan, and F. N. Nugraha, “Implementation of Decision Tree C4.5 for Big Five Personality Predictions with TF-RF and TF-CHI2 on Social Media Twitter,” 2019 Int. Conf. Comput. Control. Informatics its Appl. Emerg. Trends Big Data Artif. Intell. IC3INA 2019, pp. 114–119, 2019, doi: 10.1109/IC3INA48034.2019.8949601.
C. Yuan, J. Wu, H. Li, and L. Wang, “Personality Recognition Based on User Generated Content,” 2018 15th Int. Conf. Serv. Syst. Serv. Manag. ICSSSM 2018, pp. 1–6, 2018, doi: 10.1109/ICSSSM.2018.8465006.
Yusra et al., “Klasifikasi Kepribadian Big Five Pengguna Twitter dengan Metode Naïve Bayes,” no. November, pp. 2579–5406, 2018.
S. Bharadwaj, S. Sridhar, R. Choudhary, and R. Srinath, “Persona Traits Identification based on Myers-Briggs Type Indicator(MBTI) - A Text Classification Approach,” 2018 Int. Conf. Adv. Comput. Commun. Informatics, ICACCI 2018, pp. 1076–1082, 2018, doi: 10.1109/ICACCI.2018.8554828.
M. Hassanein, “Predicting Personality Traits from Social Media using Text Semantics,” 2018 13th Int. Conf. Comput. Eng. Syst., pp. 184–189, 2018.
W. Li, S. Gao, H. Zhou, Z. Huang, K. Zhang, and W. Li, “The automatic text classification method based on bert and feature union,” Proc. Int. Conf. Parallel Distrib. Syst. - ICPADS, vol. 2019-Decem, pp. 774–777, 2019, doi: 10.1109/ICPADS47876.2019.00114.
B. Wilie et al., “IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding,” 2020, [Online]. Available: http://arxiv.org/abs/2009.05387.
M. G. Sousa, K. Sakiyama, L. D. S. Rodrigues, P. H. Moraes, E. R. Fernandes, and E. T. Matsubara, “BERT for stock market sentiment analysis,” Proc. - Int. Conf. Tools with Artif. Intell. ICTAI, vol. 2019-Novem, pp. 1597–1601, 2019, doi: 10.1109/ICTAI.2019.00231.
S. Liu, H. Tao, and S. Feng, “Text Classification Research Based on Bert Model and Bayesian Network,” Proc. - 2019 Chinese Autom. Congr. CAC 2019, pp. 5842–5846, 2019, doi: 10.1109/CAC48633.2019.8996183.
Y. J. Nie, G. J. Gao, Y. X. Wang, D. X. Liu, and K. Gao, “Personality predicting model based on user’s linguistic behavior,” Proc. 2017 9th Int. Conf. Model. Identif. Control. ICMIC 2017, vol. 2018-March, no. Icmic, pp. 827–832, 2018, doi: 10.1109/ICMIC.2017.8321569.
J. Eka Sembodo, E. Budi Setiawan, and Z. Abdurahman Baizal, “Data Crawling Otomatis pada Twitter,” no. September, pp. 11–16, 2016, doi: 10.21108/indosc.2016.111.
R. R. Mccrae et al., “The NEO – PI – 3 : A More Readable Revised NEO Personality Inventory The NEO – PI – 3 : A More Readable Revised NEO Personality Inventory,” J. Pers. Assess., vol. 84, no. 3, pp. 261–270, 2016, doi: 10.1207/s15327752jpa8403.
İ. Ergu, “Twitter Verisi ve Makine Ö ğ renmesi Modelleriyle Ki ş ilik Tahminleme Predicting Personality with Twitter Data and Machine Learning Models,” no. 1, 2019.
Z. Gao, A. Feng, X. Song, and X. Wu, “Target-dependent sentiment classification with BERT,” IEEE Access, vol. 7, pp. 154290–154299, 2019, doi: 10.1109/ACCESS.2019.2946594.
F. K. Khattak, S. Jeblee, C. Pou-Prom, M. Abdalla, C. Meaney, and F. Rudzicz, “A survey of word embeddings for clinical text,” J. Biomed. Informatics X, vol. 4, no. April, p. 100057, 2019, doi: 10.1016/j.yjbinx.2019.100057.
L. Demidova and I. Klyueva, “SVM classification: Optimization with the SMOTE algorithm for the class imbalance problem,” 2017 6th Mediterr. Conf. Embed. Comput. MECO 2017 - Incl. ECYPS 2017, Proc., no. June, pp. 17–20, 2017, doi: 10.1109/MECO.2017.7977136.
E. García-Gonzalo, Z. Fernández-Muñiz, P. J. G. Nieto, A. B. Sánchez, and M. M. Fernández, “Hard-rock stability analysis for span design in entry-type excavations with learning classifiers,” Materials (Basel)., vol. 9, no. 7, pp. 1–19, 2016, doi: 10.3390/ma9070531.
A. C. Flores, K. D. Gorro, R. I. Icoy, and C. F. Peña, “An Evaluation of SVM and Naive Bayes with SMOTE on Sentiment Analysis Data Set,” 2018 Int. Conf. Eng. Appl. Sci. Technol., pp. 1–4, 2018.
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, no. Mlm, pp. 4171–4186, 2019.
Copyright (c) 2021 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;