Optimization Prediction of Big Five Personality in Twitter Users

  • Gita Safitri Telkom University
  • Erwin Budi Setiawan Telkom University
Keywords: Big Five Personality, SVM, TF-IDF, LIWC, Optimization


Various kinds of information can be acquired from social media platforms; one of them is on Twitter. User biographical information and tweets are the essential assets for research that can describe the Big Five Personality, including openness, conscientiousness, extraversion, agreeableness, and neuroticism. Several previous studies have tried the prediction of Big Five Personality. However, the authors found problems in how to optimize the work of the personality prediction system. So, in this study, Big Five Personality predictions were carried out on users of Twitter and improved the performance of the personality prediction system. We implement optimization techniques such as sampling, feature selection, and hyperparameter tuning to enhance the performance. This study also applies linguistic feature extraction, such as LIWC and TF-IDF. By using 287 Twitter users that have permitted their data to be crawled acquired from an online survey using Big Five Inventory (BFI), and applying all optimization techniques, the average accuracy result is 84.22% which is a 74.44% gain over the specified baseline.


Download data is not yet available.


D. Preoţiuc-Pietro, J. Carpenter, S. Giorgi, and L. Ungar, “Studying the dark triad of personality through twitter behavior,” Int. Conf. Inf. Knowl. Manag. Proc., vol. 24-28-Octo, pp. 761–770, 2016, doi: 10.1145/2983323.2983822.

G. Carducci, G. Rizzo, D. Monti, E. Palumbo, and M. Morisio, “TwitPersonality: Computing personality traits from tweets using word embeddings and supervised learning,” Inf., vol. 9, no. 5, pp. 1–20, 2018, doi: 10.3390/info9050127.

C. Li, J. Wan, and B. Wang, “Personality Prediction of Social Network Users,” Proc. - 2017 16th Int. Symp. Distrib. Comput. Appl. to Business, Eng. Sci. DCABES 2017, vol. 2018-Septe, pp. 84–87, 2017, doi: 10.1109/DCABES.2017.25.

V. Varshney, A. Varshney, T. Ahmad, and A. M. Khan, “Recognising personality traits using social media,” IEEE Int. Conf. Power, Control. Signals Instrum. Eng. ICPCSI 2017, pp. 2876–2881, 2018, doi: 10.1109/ICPCSI.2017.8392248.

A. T. Damanik and Masayu Leylia Khodra, “Prediksi Kepribadian Big 5 Pengguna Twitter dengan Support Vector Regression,” J. Cybermatika, vol. 3, no. 1, pp. 14–22, 2015.

T. Tandera, Hendro, D. Suhartono, R. Wongso, and Y. L. Prasetio, “Personality Prediction System from Facebook Users,” Procedia Comput. Sci., vol. 116, pp. 604–611, 2017, doi: 10.1016/j.procs.2017.10.016.

G. D. Salsabila and E. B. Setiawan, “Semantic Approach for Big Five Personality Prediction on Twitter,” RESTI, vol. 5, no. 4, pp. 680–687, 2021.

S. Maloji, K. Mannepalli, N. S. J, K. B. Sri, and C. Sasidhar, “Big Five Personality Prediction from Social Media Data using Machine Learning Techniques,” Int. J. Eng. Adv. Technol., vol. 9, no. 4, pp. 2412–2417, 2020, doi: 10.35940/ijeat.d7946.049420.

D. E. Cahyani and A. F. Faishal, “Classification of Big Five Personality Behavior Tendencies Based on Study Field with Twitter Analysis Using Support Vector Machine,” 7th Int. Conf. Inf. Technol. Comput. Electr. Eng. ICITACEE 2020 - Proc., pp. 140–145, 2020, doi: 10.1109/ICITACEE50144.2020.9239130.

Willy, E. B. Setiawan, and F. N. Nugraha, “Implementation of Decision Tree C4.5 for Big Five Personality Predictions with TF-RF and TF-CHI2 on Social Media Twitter,” 2019 Int. Conf. Comput. Control. Informatics its Appl. Emerg. Trends Big Data Artif. Intell. IC3INA 2019, pp. 114–119, 2019, doi: 10.1109/IC3INA48034.2019.8949601.

F. Ilzam Nur Haq and E. Budi, “Implementasi Naive Bayes Classifier untuk Prediksi Kepribadian Big Five pada Twitter Menggunakan Term Frequency-Inverse Document Frequency ( TF-IDF ) dan Term Frequency-Relevance Frequency ( TF-RF ) Program Studi Sarjana Ilmu Komputasi Fakultas Informatik,” e-Proceeding Eng., vol. 6, no. 2, pp. 9785–9795, 2019.

A. Souri, S. Hosseinpour, and A. M. Rahmani, “Personality classification based on profiles of social networks’ users and the five-factor model of personality,” Human-centric Comput. Inf. Sci., vol. 8, no. 1, 2018, doi: 10.1186/s13673-018-0147-4.

Y. J. Nie, G. J. Gao, Y. X. Wang, D. X. Liu, and K. Gao, “Personality predicting model based on user’s linguistic behavior,” Proc. 2017 9th Int. Conf. Model. Identif. Control. ICMIC 2017, vol. 2018-March, no. Icmic, pp. 827–832, 2018, doi: 10.1109/ICMIC.2017.8321569.

R. R. Mccrae et al., “The NEO – PI – 3 : A More Readable Revised NEO Personality Inventory The NEO – PI – 3 : A More Readable Revised NEO Personality Inventory,” J. Pers. Assess., vol. 84, no. 3, pp. 261–270, 2016, doi: 10.1207/s15327752jpa8403.

J. Eka Sembodo, E. Budi Setiawan, and Z. Abdurahman Baizal, “Data Crawling Otomatis pada Twitter,” no. October 2018, pp. 11–16, 2016, doi: 10.21108/indosc.2016.111.

S. Zhang, C. Zhang, and Q. Yang, “Data preparation for data mining,” Appl. Artif. Intell., vol. 17, no. 5–6, pp. 375–381, May 2003, doi: 10.1080/713827180.

B. Y. Pratama and R. Sarno, “Personality classification based on Twitter text using Naive Bayes, KNN and SVM,” Proc. 2015 Int. Conf. Data Softw. Eng. ICODSE 2015, pp. 170–174, 2016, doi: 10.1109/ICODSE.2015.7436992.

J. Golbeck, “Predicting Personality from Social Media Text,” AIS Trans. Replication Res., vol. 2, no. September, pp. 1–10, 2016, doi: 10.17705/1atrr.00009.

I. Ergu, Z. Isik, and I. Yankayis, “Predicting Personality with Twitter Data and Machine Learning Models,” Oct. 2019, doi: 10.1109/ASYU48272.2019.8946355.

Y. R. Tausczik and J. W. Pennebaker, “The psychological meaning of words: LIWC and computerized text analysis methods,” J. Lang. Soc. Psychol., vol. 29, no. 1, pp. 24–54, 2010, doi: 10.1177/0261927X09351676.

E. Tighe and C. Cheng, “Modeling Personality Traits of Filipino Twitter Users,” pp. 112–122, 2018, doi: 10.18653/v1/w18-1115.

C. Zoltan, “SVM and Kernel SVM | Towards Data Science.” https://towardsdatascience.com/svm-and-kernel-svm-fed02bef1200 (accessed Aug. 26, 2021).

A. S. Nugroho, A. B. Witarto, and D. Handoko, “Support Vector Machine-Teori dan Aplikasinya dalam Bioinformatika 1,” 2003, Accessed: Aug. 27, 2021. [Online]. Available: http://asnugroho.net.

L. Demidova and I. Klyueva, “SVM classification: Optimization with the SMOTE algorithm for the class imbalance problem,” 2017 6th Mediterr. Conf. Embed. Comput. MECO 2017 - Incl. ECYPS 2017, Proc., no. June, pp. 17–20, 2017, doi: 10.1109/MECO.2017.7977136.

P. P. Ippolito, “SVM: Feature Selection and Kernels | Towards Data Science.” https://towardsdatascience.com/svm-feature-selection-and-kernels-840781cc1a6c (accessed Sep. 20, 2021).

A. Al Marouf, M. K. Hasan, and H. Mahmud, “Comparative Analysis of Feature Selection Algorithms for Computational Personality Prediction from Social Media,” IEEE Trans. Comput. Soc. Syst., vol. 7, no. 3, pp. 587–599, 2020, doi: 10.1109/TCSS.2020.2966910.

A. Chugh, “ML | Chi-square Test for feature selection | GeeksforGeeks.” https://www.geeksforgeeks.org/ml-chi-square-test-for-feature-selection/ (accessed Aug. 25, 2021).

G. Y. N. N. Adi, M. H. Tandio, V. Ong, and D. Suhartono, “Optimization for Automatic Personality Recognition on Twitter in Bahasa Indonesia,” Procedia Comput. Sci., vol. 135, pp. 473–480, 2018, doi: 10.1016/j.procs.2018.08.199.

U. Malik, “Cross Validation and Grid Search for Model Selection in Python | Stack Abuse.” https://stackabuse.com/cross-validation-and-grid-search-for-model-selection-in-python/ (accessed Aug. 26, 2021).

A. A. Arifiyanti and E. D. Wahyuni, “Smote: Metode Penyeimbang Kelas Pada Klasifikasi Data Mining,” SCAN - J. Teknol. Inf. dan Komun., vol. 15, no. 1, pp. 34–39, 2020, doi: 10.33005/scan.v15i1.1850.

H. M. Nguyen, E. W. Cooper, and K. Kamei, “Borderline over-sampling for imbalanced data classification,” Int. J. Knowl. Eng. Soft Data Paradig., vol. 3, no. 1, p. 4, 2011, doi: 10.1504/ijkesdp.2011.039875.

How to Cite
Gita Safitri, & Erwin Budi Setiawan. (2022). Optimization Prediction of Big Five Personality in Twitter Users. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(1), 85 - 91. https://doi.org/10.29207/resti.v6i1.3529
Artikel Rekayasa Sistem Informasi