Analyzing Reddit Data: Hybrid Model for Depression Sentiment using FastText Embedding
Abstract
Depression, a prevalent mental condition worldwide, exerts a substantial influence on various aspects of human cognition, emotions, and behavior. The alarming increase in deaths attributable to depression in recent years demonstrates the imperative need to address this problem through prevention and treatment interventions. In the era of thriving social media platforms, which have a significant impact on society and psychological aspects, these platforms have become a means for people to express their emotions and experiences openly. Reddit stands out among these platforms as a significant place. The main aim of this study is to examine the feasibility of forecasting individuals' mental states by classifying Reddit articles on depression and non-depression. This work aims to employ deep learning algorithms and word embeddings to analyze the textual and semantic settings of narratives to detect symptoms of depression. The study effectively employed a BiLSTM-BiGRU model that applied FastText word embeddings. The BiLSTM-BiGRU model analyzes information bidirectionally, detecting correlations in sequential data. It is suitable for tasks dependent on input order or for addressing data uncertainties. The Reddit dataset, which contains text concerning depression, achieved an accuracy score of 97.03% and an F1 score of 97.02%.
Downloads
References
S. Ghosal and A. Jain, “Research Journey of Hate Content Detection From Cyberspace,” 2021, pp. 200–225. doi: 10.4018/978-1-7998-4240-8.ch009.
W. H. Organization, “Depression and other common mental disorders: global health estimates,” World Health Organization, Geneva PP - Geneva. [Online]. Available: https://apps.who.int/iris/handle/10665/254610
“National Institute of Health Research and Development. Indonesia Basic Health Survey (RISKESDAS) 2018. Jakarta: National Institute of Health Research and Development; 2019 (Indonesian)”.
Suryaputri, “Gender and other factors and risk of mental emotional problems among students in Indonesia. Health Science Journal of Indonesia,” Heal. Sci. J. Indones., vol. 4, no. 2, pp. 98–102, 2013.
R. Mubasyiroh, S. Idaiani, and I. Suryaputri, “Treatment-Seeking Behavior in Populations with Depression Symptoms,” Media Penelit. dan Pengemb. Kesehat., vol. 30, pp. 45–54, May 2020, doi: 10.22435/mpk.v30i1.2690.
R. S. J. Paffenbarger, I. M. Lee, and R. Leung, “Physical activity and personal characteristics associated with depression and suicide in American college men.,” Acta Psychiatr. Scand. Suppl., vol. 377, pp. 16–22, 1994, doi: 10.1111/j.1600-0447.1994.tb05796.x.
A. Zafar and S. Chitnis, “Survey of Depression Detection using Social Networking Sites via Data Mining,” in 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2020, pp. 88–93. doi: 10.1109/Confluence47617.2020.9058189.
B. A. Kohrt et al., “Culture in psychiatric epidemiology: Using ethnography and multiple mediator models to assess the relationship of caste with depression and anxiety in Nepal,” Ann. Hum. Biol., vol. 36, no. 3, pp. 261–280, Jan. 2009, doi: 10.1080/03014460902839194.
World Health Organization, Suicide worldwide in 2019: Global Health Estimates. 2019. [Online]. Available: https://apps.who.int/iris/rest/bitstreams/1350975/retrieve
M. A. Oquendo, S. P. Ellis, S. Greenwald, K. M. Malone, M. M. Weissman, and J. J. Mann, “Ethnic and sex differences in suicide rates relative to major depression in the United States,” Am. J. Psychiatry, vol. 158, no. 10, pp. 1652–1658, 2001, doi: 10.1176/appi.ajp.158.10.1652.
A. Biradar and S. G. Totad, “Detecting Depression in Social Media Posts Using Machine Learning BT - Recent Trends in Image Processing and Pattern Recognition,” 2019, pp. 716–725.
C. Beard et al., “Network analysis of depression and anxiety symptom relationships in a psychiatric sample,” Psychol. Med., vol. 46, no. 16, pp. 3359–3369, 2016, doi: DOI: 10.1017/S0033291716002300.
H. Kour and M. K. Gupta, An hybrid deep learning approach for depression prediction from user tweets using feature-rich CNN and bi-directional LSTM, vol. 81, no. 17. Multimedia Tools and Applications, 2022. doi: 10.1007/s11042-022-12648-y.
S. Edwards, L. Tinning, J. S. L. Brown, J. Boardman, and J. Weinman, “Reluctance to seek help and the perception of anxiety and depression in the United kingdom: a pilot vignette study.,” J. Nerv. Ment. Dis., vol. 195, no. 3, pp. 258–261, Mar. 2007, doi: 10.1097/01.nmd.0000253781.49079.53.
P. S. Wang et al., “Use of mental health services for anxiety, mood, and substance disorders in 17 countries in the WHO world mental health surveys.,” Lancet (London, England), vol. 370, no. 9590, pp. 841–850, Sep. 2007, doi: 10.1016/S0140-6736(07)61414-7.
M. De Choudhury, “Role of Social Media in Tackling Challenges in Mental Health,” in Proceedings of the 2nd International Workshop on Socially-Aware Multimedia, 2013, pp. 49–52. doi: 10.1145/2509916.2509921.
M. Trotzek, S. Koitka, and C. M. Friedrich, “Utilizing Neural Networks and Linguistic Metadata for Early Detection of Depression Indications in Text Sequences,” IEEE Trans. Knowl. Data Eng., vol. 32, no. 3, pp. 588–601, 2020, doi: 10.1109/TKDE.2018.2885515.
F. Sadeque, D. Xu, and S. Bethard, “Measuring the Latency of Depression Detection in Social Media,” in Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, 2018, pp. 495–503. doi: 10.1145/3159652.3159725.
C. Lin et al., “SenseMood: Depression Detection on Social Media,” Proc. 2020 Int. Conf. Multimed. Retr., 2020.
N. Proferes, N. Jones, S. Gilbert, C. Fiesler, and M. Zimmer, “Studying Reddit: A Systematic Overview of Disciplines, Approaches, Methods, and Ethics,” Soc. Media Soc., vol. 7, no. 2, 2021, doi: 10.1177/20563051211019004.
K. Kimiafar, M. Dadkhah, M. Sarbaz, and M. Mehraeen, “An analysis on top commented posts in reddit social network about COVID-19,” J. Med. Signals Sens., vol. 11, no. 1, pp. 62–65, 2021, doi: 10.4103/jmss.JMSS-36-20.
H. M. Wang, B. Bulat, S. Fujimoto, and S. Frey, “Governing for Free: Rule Process Effects on Reddit Moderator Motivations,” Commun. Comput. Inf. Sci., vol. 1655 CCIS, pp. 97–105, 2022, doi: 10.1007/978-3-031-19682-9_14.
D. Wu et al., “Topics and Sentiment Surrounding Vaping on Twitter and Reddit During the 2019 e-Cigarette and Vaping Use-Associated Lung Injury Outbreak: Comparative Study.,” J. Med. Internet Res., vol. 24, no. 12, p. e39460, Dec. 2022, doi: 10.2196/39460.
N. Boettcher, “Studies of Depression and Anxiety Using Reddit as a Data Source: Scoping Review.,” JMIR Ment. Heal., vol. 8, no. 11, p. e29487, Nov. 2021, doi: 10.2196/29487.
Z. Chen, R. Yang, S. Fu, N. Zong, H. Liu, and M. Huang, “Detecting Reddit Users with Depression Using a Hybrid Neural Network,” ArXiv, 2023, [Online]. Available: https://arxiv.org/abs/2302.02759v1
M. M. Tadesse, H. Lin, B. Xu, and L. Yang, “Detection of depression-related posts in reddit social media forum,” IEEE Access, vol. 7, pp. 44883–44893, 2019, doi: 10.1109/ACCESS.2019.2909180.
F. T. Giuntini, M. T. Cazzolato, M. de J. D. dos Reis, A. T. Campbell, A. J. M. Traina, and J. Ueyama, “A review on recognizing depression in social networks: challenges and opportunities,” J. Ambient Intell. Humaniz. Comput., vol. 11, no. 11, pp. 4713–4729, 2020, doi: 10.1007/s12652-020-01726-4.
B. Saha, T. Nguyen, D. Phung, and S. Venkatesh, “A Framework for Classifying Online Mental Health-Related Communities With an Interest in Depression.,” IEEE J. Biomed. Heal. informatics, vol. 20, no. 4, pp. 1008–1015, Jul. 2016, doi: 10.1109/JBHI.2016.2543741.
V. Borba de Souza, J. Campos Nobre, and K. Becker, “DAC Stacking: A Deep Learning Ensemble to Classify Anxiety, Depression, and Their Comorbidity From Reddit Texts.,” IEEE J. Biomed. Heal. informatics, vol. 26, no. 7, pp. 3303–3311, Jul. 2022, doi: 10.1109/JBHI.2022.3151589.
A. Yates, A. Cohan, and N. Goharian, “Depression and Self-Harm Risk Assessment in Online Forums,” pp. 2968–2978, 2017.
A. Cohan, B. Desmet, and S. Macavaney, “SMHD : A Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions,” pp. 1485–1497, 2018.
V. Souza, J. Nobre, and K. Becker, Characterization of Anxiety, Depression, and their Comorbidity from Texts of Social Networks. 2020. doi: 10.5753/sbbd.2020.13630.
A. Dinu and A. Moldovan, “Automatic Detection and Classification of Mental Illnesses from General Social Media Texts,” pp. 358–366, 2021.
Z. Jiang, J. Zomick, S. I. Levitan, H. College, and J. Hirschberg, “Detection of Mental Health Conditions from Reddit via Deep Contextualized Representations,” pp. 147–156, 2020.
D. Endalie, G. Haile, and W. Taye, “Bi-directional long short term memory-gated recurrent unit model for Amharic next word prediction,” PLoS One, vol. 17, no. 8 August, pp. 1–10, 2022, doi: 10.1371/journal.pone.0273156.
J. Imran and B. Raman, “Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition,” J. Ambient Intell. Humaniz. Comput., vol. 11, Jan. 2020, doi: 10.1007/s12652-019-01239-9.
Y. Duan, Y. Liu, Y. Wang, S. Ren, and Y. Wang, “Improved BIGRU Model and Its Application in Stock Price Forecasting,” Electron., vol. 12, no. 12, pp. 1–19, 2023, doi: 10.3390/electronics12122718.
E. M. Dharma, F. L. Gaol, H. L. H. S. Warnars, and B. Soewito, “THE ACCURACY COMPARISON AMONG WORD2VEC, GLOVE, AND FASTTEXT TOWARDS CONVOLUTION NEURAL NETWORK (CNN) TEXT CLASSIFICATION,” J. Theor. Appl. Inf. Technol., vol. 100, no. 2, pp. 349–359, 2022.
“Depression: Reddit Dataset (Cleaned).” Aug. 2022. [Online]. Available: https://www.kaggle.com/datasets/infamouscoder/depression-reddit-cleaned
A. A. Lutfi, A. E. Permanasari, and S. Fauziati, “Corrigendum: Sentiment Analysis in the Sales Review of Indonesian Marketplace by Utilizing Support Vector Machine,” J. Inf. Syst. Eng. Bus. Intell., vol. 4, no. 2, p. 169, 2018, doi: 10.20473/jisebi.4.2.169.
Y. Nooryuda Prasetya and D. Winarso, “Penerapan Lexicon Based Untuk Analisis Sentimen Pada Twiter Terhadap Isu Covid-19,” J. Fasilkom, vol. 11, no. 2, pp. 97–103, 2021.
M. Yasen and S. Tedmori, “Movies reviews sentiment analysis and classification,” 2019 IEEE Jordan Int. Jt. Conf. Electr. Eng. Inf. Technol. JEEIT 2019 - Proc., pp. 860–865, 2019, doi: 10.1109/JEEIT.2019.8717422.
P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching Word Vectors with Subword Information,” vol. 5, pp. 135–146, 2017.
A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” 15th Conf. Eur. Chapter Assoc. Comput. Linguist. EACL 2017 - Proc. Conf., vol. 2, pp. 427–431, 2017, doi: 10.18653/v1/e17-2068.
S. Hochreiter and J. Schmidhuber, “Long Short-term Memory,” Neural Comput., vol. 9, pp. 1735–1780, Dec. 1997, doi: 10.1162/neco.1997.9.8.1735.
D. Nam, J. Yasmin, and F. Zulkernine, “Effects of Pre-trained Word Embeddings on Text-based Deception Detection,” Proc. - IEEE 18th Int. Conf. Dependable, Auton. Secur. Comput. IEEE 18th Int. Conf. Pervasive Intell. Comput. IEEE 6th Int. Conf. Cloud Big Data Comput. IEEE 5th Cybe, pp. 437–443, 2020, doi: 10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00083.
M. R. Raza, W. Hussain, and J. M. Merigó, “Cloud Sentiment Accuracy Comparison using RNN, LSTM and GRU,” in 2021 Innovations in Intelligent Systems and Applications Conference (ASYU), 2021, pp. 1–5. doi: 10.1109/ASYU52992.2021.9599044.
M. Zulqarnain, R. Ghazali, M. G. Ghouse, and M. F. Mushtaq, “Efficient processing of GRU based on word embedding for text classification,” Int. J. Informatics Vis., vol. 3, no. 4, pp. 377–383, 2019, doi: 10.30630/joiv.3.4.289.
L. Shi, K. Du, C. Zhang, H. Ma, and W. Yan, “Lung Sound Recognition Algorithm Based on VGGish-BiGRU,” IEEE Access, vol. PP, p. 1, Sep. 2019, doi: 10.1109/ACCESS.2019.2943492.
A. Kulkarni, D. Chong, and F. A. Batarseh, Foundations of data imbalance and solutions for a data democracy. Elsevier Inc., 2020. doi: 10.1016/B978-0-12-818366-3.00005-8.
E. Zvornicanin, “Differences Between Bidirectional and Unidirectional LSTM | Baeldung on Computer Science.” Jun. 2023. [Online]. Available: https://www.baeldung.com/cs/bidirectional-vs-unidirectional-lstm
Copyright (c) 2024 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;