Evaluating Transformer Models for  Social Media Text-Based Personality Profiling

Anggit Hartanto; Ema Utami; Arief Setyanto; Kusrini

doi:10.29207/resti.v9i1.6157

Anggit Hartanto Universitas Amikom Yogyakarta
Ema Utami Universitas Amikom Yogyakarta
Arief Setyanto Universitas Amikom Yogyakarta
Kusrini Universitas Amikom Yogyakarta

DOI: https://doi.org/10.29207/resti.v9i1.6157

Keywords: BERT Variants, Profiling analysis, Transformer

Abstract

This research aims to evaluate the performance of various Transformer models in social media-based classification tasks, specifically focusing on applications in personality profiling. With the growing interest in leveraging social media as a data source for understanding individual personality traits, selecting an appropriate model becomes crucial for enhancing accuracy and efficiency in large-scale data processing. Accurate personality profiling can provide valuable insights for applications in psychology, marketing, and personalized recommendations. In this context, models such as BERT, RoBERTa, DistilBERT, TinyBERT, MobileBERT, and ALBERT are utilized in this study to understand their performance differences under varying configurations and dataset conditions, assessing their suitability for nuanced personality profiling tasks. The research methodology involves four experimental scenarios with a structured process that includes data acquisition, preprocessing, tokenization, model fine-tuning, and evaluation. In Scenarios 1 and 2, a full dataset of 9,920 data points was used with standard fine-tuning parameters for all models. In contrast, ALBERT in Scenario 2 was optimized using customized batch size, learning rate, and weight decay. Scenarios 3 and 4 used 30% of the total dataset, with additional adjustments for ALBERT to examine its performance under specific conditions. Each scenario is designed to test model robustness against variations in parameters and dataset size. The experimental results underscore the importance of tailoring fine-tuning parameters to optimize model performance, particularly for parameter-efficient models like ALBERT. ALBERT and MobileBERT demonstrated strong performance across conditions, excelling in scenarios requiring accuracy and efficiency. BERT proved to be a robust and reliable choice, maintaining high performance even with reduced data, while RoBERTa and DistilBERT may require further adjustments to adapt to data-limited conditions. Although efficient, TinyBERT may fall short on tasks demanding high accuracy due to its limited representational capacity. Selecting the right model requires balancing computational efficiency, task-specific requirements, and data complexity.

Downloads

Download data is not yet available.

References

E. Utami, A. D. Hartanto, S. Adi, I. Oyong, and S. Raharjo, “Profiling analysis of DISC personality traits based on Twitter posts in Bahasa Indonesia,” Journal of King Saud University - Computer and Information Sciences, Oct. 2022, doi: 10.1016/j.jksuci.2019.10.008.

M. A. Iqbal, F. A. Ammar, A. R. Aldaihani, T. K. U. Khan, and A. Shah, “Building Most Effective Requirements Engineering Teams by Evaluating Their Personality Traits Using Big-Five Assesment Model,” 2019.

P. Kavya and V. Kanchana, “Student Personality Analysis In Blended Mode Using Big Five,” in 2023 International Conference on Network, Multimedia and Information Technology, NMITCON 2023, Institute of Electrical and Electronics Engineers Inc., 2023. doi: 10.1109/NMITCON58196.2023.10276332.

A. Koutsoumpis et al., “Beyond traditional interviews: Psychometric analysis of asynchronous video interviews for personality and interview performance evaluation using machine learning,” Comput Human Behav, vol. 154, May 2024, doi: 10.1016/j.chb.2023.108128.

M. M. Tadesse, H. Lin, B. Xu, and L. Yang, “Personality Predictions Based on User Behavior on the Facebook Social Media Platform,” IEEE Access, vol. 6, pp. 61959–61969, 2018, doi: 10.1109/ACCESS.2018.2876502.

E. Utami, A. F. Iskandar, A. D. Hartanto, and S. Raharjo, “DISC Personality Classification using Twitter: Usability Testing,” in Proceedings - 2021 IEEE 5th International Conference on Information Technology, Information Systems and Electrical Engineering: Applying Data Science and Artificial Intelligence Technologies for Global Challenges During Pandemic Era, ICITISEE 2021, Institute of Electrical and Electronics Engineers Inc., 2021, pp. 180–185. doi: 10.1109/ICITISEE53823.2021.9655937.

C. Geary, E. March, and R. Grieve, “Insta-identity: Dark personality traits as predictors of authentic self-presentation on Instagram,” Telematics and Informatics, vol. 63, Oct. 2021, doi: 10.1016/j.tele.2021.101669.

A. D. Hartanto, E. Utami, Kusrini, and A. Setyanto, “A Survey of Semantic Approaches in Personality Traits Profiling Analysis,” in 2024 International Conference on Smart Computing, IoT and Machine Learning, SIML 2024, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 35–42. doi: 10.1109/SIML61815.2024.10578100.

S. Bazzaz Abkenar, M. Haghi Kashani, E. Mahdipour, and S. M. Jameii, “Big data analytics meets social media: A systematic review of techniques, open issues, and future directions,” Telematics and Informatics, vol. 57, Mar. 2021, doi: 10.1016/j.tele.2020.101517.

Z. Khan, “A Deep Learning Approach for Predicting Personality Traits,” in 2023 14th International Conference on Computing Communication and Networking Technologies, ICCCNT 2023, Institute of Electrical and Electronics Engineers Inc., 2023. doi: 10.1109/ICCCNT56998.2023.10307763.

A. Salem, Jāmiʻat ʻAyn Shams, Egyptian Knowledge Bank, Institute of Electrical and Electronics Engineers. Egypt Section, and Institute of Electrical and Electronics Engineers, Predicting Personality Traits from Social Media using Text Semantics. 2018.

W. Maharani and V. Effendy, “Big five personality prediction based in Indonesian tweets using machine learning methods,” International Journal of Electrical and Computer Engineering, vol. 12, no. 2, pp. 1973–1981, Apr. 2022, doi: 10.11591/ijece.v12i2.pp1973-1981.

M. Jayaratne and B. Jayatilleke, “Predicting Personality Using Answers to Open-Ended Interview Questions,” IEEE Access, vol. 8, pp. 115345–115355, 2020, doi: 10.1109/ACCESS.2020.3004002.

H. Chen, X. Zhang, and X. Wu, “Research on Improving Personalized Recommendation Accuracy Based on NLP Semantic Analysis,” in 2023 IEEE International Conference on Control, Electronics and Computer Technology, ICCECT 2023, Institute of Electrical and Electronics Engineers Inc., 2023, pp. 1113–1118. doi: 10.1109/ICCECT57938.2023.10140740.

W. Khan, A. Daud, K. Khan, S. Muhammad, and R. Haq, “Exploring the frontiers of deep learning and natural language processing: A comprehensive overview of key challenges and emerging trends,” Natural Language Processing Journal, vol. 4, p. 100026, Sep. 2023, doi: 10.1016/j.nlp.2023.100026.

J. Devlin, M.-W. Chang, K. Lee, K. T. Google, and A. I. Language, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” 2019. [Online]. Available: https://github.com/tensorflow/tensor2tensor

B. Bsir, N. Khoufi, and M. Zrigui, “Prediction of Author’s Profile Basing on Fine-Tuning BERT Model,” Informatica (Slovenia), vol. 48, no. 1, pp. 69–78, Mar. 2024, doi: 10.31449/inf.v48i1.4839.

N. Halimawan, D. Suhartono, A. P. Gema, and R. Yunanda, “BERT and ULMFiT Ensemble for Personality Prediction from Indonesian Social Media Text,” in Proceeding - 2022 International Symposium on Information Technology and Digital Innovation: Technology Innovation During Pandemic, ISITDI 2022, Institute of Electrical and Electronics Engineers Inc., 2022, pp. 156–161. doi: 10.1109/ISITDI55734.2022.9944476.

Y. Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” 2019. [Online]. Available: https://github.com/pytorch/fairseq

V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” Oct. 2019, [Online]. Available: http://arxiv.org/abs/1910.01108

M. A. Kosan, H. Karacan, and B. A. Urgen, “Personality traits prediction model from Turkish contents with semantic structures,” Neural Comput Appl, vol. 35, no. 23, pp. 17147–17165, Aug. 2023, doi: 10.1007/s00521-023-08603-z.

F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” Online. [Online]. Available: https://huggingface.co/

H. Lucky, Roslynlia, and D. Suhartono, “Towards Classification of Personality Prediction Model: A Combination of BERT Word Embedding and MLSMOTE,” in Proceedings of 2021 1st International Conference on Computer Science and Artificial Intelligence, ICCSAI 2021, Institute of Electrical and Electronics Engineers Inc., 2021, pp. 346–350. doi: 10.1109/ICCSAI53272.2021.9609750.

N. Mauliza, A. Shakila Iedwan, Y. Pristyanto, A. D. Hartanto, and A. N. Rohman, “Resampling Techniques on Model Performance Classification of Maternal Health Risks,” J. RESTI (Rekayasa Sist. Teknol. Inf.), vol. 10, no. 4, pp. 496–505, 2024, doi: 10.29207/resti.v8i4.5934.

Sichuan Institute of Electronics and Institute of Electrical and Electronics Engineers, Feature Analysis and Optimisation for Computational Personality Recognition.

S. Ouni, F. Fkih, and M. N. Omri, “Novel semantic and statistic features-based author profiling approach,” J Ambient Intell Humaniz Comput, Sep. 2022, doi: 10.1007/s12652-022-04198-w.

M. Hassanein, S. Rady, W. Hussein, and T. F. Gharib, “Predicting the Big Five for social network users using their personality characteristics,” in Proceedings - 2021 IEEE 10th International Conference on Intelligent Computing and Information Systems, ICICIS 2021, Institute of Electrical and Electronics Engineers Inc., 2021, pp. 160–164. doi: 10.1109/ICICIS52592.2021.9694160.

Z. Lan et al., “ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS.” [Online]. Available: https://github.com/google-research/ALBERT.

X. Jiao et al., “TinyBERT: Distilling BERT for Natural Language Understanding,” Sep. 2019, [Online]. Available: http://arxiv.org/abs/1909.10351

Z. Sun, H. Yu, X. Song, R. Liu, Y. Yang, and D. Zhou, “MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices,” Association for Computational Linguistics. [Online]. Available: https://github.com/google-research/

L. Sun and K. A. Aksyonov, “Fine-Tuning Bert on the Atis Dataset: Data Enhancement to Improve Intent Classification Accuracy,” in 2024 9th International Symposium on Computer and Information Processing Technology, ISCIPT 2024, Institute of Electrical and Electronics Engineers Inc., 2024, pp. 274–278. doi: 10.1109/ISCIPT61983.2024.10672674.

Y. G. Xu, X. P. Qiu, L. G. Zhou, and X. J. Huang, “Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation,” J Comput Sci Technol, vol. 38, no. 4, pp. 853–866, Jul. 2023, doi: 10.1007/s11390-021-1119-0.

I. Panopoulos, S. Nikolaidis, S. I. Venieris, and I. S. Venieris, “Exploring the Performance and Efficiency of Transformer Models for NLP on Mobile Devices,” in Proceedings - IEEE Symposium on Computers and Communications, Institute of Electrical and Electronics Engineers Inc., 2023. doi: 10.1109/ISCC58397.2023.10217850.

H. Sobhanam and J. Prakash, “Analysis of fine tuning the hyper parameters in RoBERTa model using genetic algorithm for text classification,” International Journal of Information Technology (Singapore), vol. 15, no. 7, pp. 3669–3677, Oct. 2023, doi: 10.1007/s41870-023-01395-4.

Evaluating Transformer Models for Social Media Text-Based Personality Profiling

Abstract

Downloads

References

Most read articles by the same author(s)