Automatic Classification of Multilanguage Scientific Papers to the Sustainable Development Goals Using Transfer Learning

Lya Hulliyyatus Suadaa; Anugerah Karta Monika; Berliana Sugiarti Putri; Yeni Rimawati

doi:10.29207/resti.v9i3.6560

Lya Hulliyyatus Suadaa Politeknik Statistika STIS
Anugerah Karta Monika Politeknik Statistika STIS
Berliana Sugiarti Putri Badan Pusat Statistik
Yeni Rimawati Politeknik Statistika STIS

DOI: https://doi.org/10.29207/resti.v9i3.6560

Keywords: SDGs research, scientific papers, multilabel text classification, multilingual model

Abstract

The classification of scientific papers according to their relevance to Sustainable Development Goals (SDGs) is a critical task in identifying the research development status of goals. However, with the growing volume of scientific literature published worldwide in multiple languages, manual categorization of these papers has become increasingly complex and time-consuming. Furthermore, the need for a comprehensive multilingual dataset to train effective models complicates the task, as obtaining such datasets for various languages is resource intensive. This study proposes a solution to this problem by leveraging transfer learning techniques to automatically classify scientific papers into SDG labels. By fine-tuning pretrained multilingual models mBERT on SDG publication datasets in a multilabel approach, we demonstrate that transfer learning can significantly improve classification performance, even with limited labelled data, compared to SVM. Our approach enables the effective processing of scientific papers in different languages and facilitates the seamless mapping of research to the relevance of SDGs, the four pillars of SDGs, and the 17 goals of SDGs. The proposed method addresses the scalability issue in SDG classification and lays the groundwork for more efficient systems that can handle the multilingual nature of modern scientific publications.

Downloads

Download data is not yet available.

References

United Nations Development Programme (UNDP), “What are the Sustainable Development Goals?” Accessed: Jun. 18, 2025. [Online]. Available: https://www.undp.org/sustainable-development-goals

P. Berrone, H. E. Rousseau, J. E. Ricart, E. Brito, and A. Giuliodori, “How can research contribute to the implementation of sustainable development goals? An interpretive review of SDG literature in management,” International Journal of Management Reviews, vol. 25, no. 2, pp. 318–339, Apr. 2023, doi: 10.1111/ijmr.12331.

S. Sorooshian, “The sustainable development goals of the United Nations: A comparative midterm research review,” J Clean Prod, vol. 453, p. 142272, May 2024, doi: 10.1016/j.jclepro.2024.142272.

M. Mishra et al., “A bibliometric analysis of sustainable development goals (SDGs): a review of progress, challenges, and opportunities,” Environ Dev Sustain, vol. 26, no. 5, pp. 11101–11143, May 2023, doi: 10.1007/s10668-023-03225-w.

F. Indana and R. W. Pahlevi, “A bibliometric approach to Sustainable Development Goals (SDGs) systematic analysis,” Cogent Business & Management, vol. 10, no. 2, Dec. 2023, doi: 10.1080/23311975.2023.2224174.

N. V. Diniz, D. R. Cunha, M. de Santana Porte, C. B. M. Oliveira, and F. de Freitas Fernandes, “A bibliometric analysis of sustainable development goals in the maritime industry and port sector,” Reg Stud Mar Sci, vol. 69, p. 103319, Jan. 2024, doi: 10.1016/j.rsma.2023.103319.

F. Invernici, F. Curati, J. Jakimov, A. Samavi, and A. Bernasconi, “Capturing research literature attitude towards Sustainable Development Goals: an LLM-based topic modeling approach,” Nov. 2024, doi: 10.1186/s40537-025-01189-4.

F. Illia, R. Nooraeni, and L. H. Suadaa, “Implementation of Topic Modeling in the Analysis of Topic Trends in SDGs Goal 6 Research,” in 2023 International Conference on Electrical Engineering and Informatics (ICEEI), IEEE, Oct. 2023, pp. 1–6. doi: 10.1109/ICEEI59426.2023.10346917.

A. Hajikhani and A. Suominen, “Mapping the sustainable development goals (SDGs) in science, technology and innovation: application of machine learning in SDG-oriented artefact detection,” Scientometrics, vol. 127, no. 11, pp. 6661–6693, Nov. 2022, doi: 10.1007/s11192-022-04358-x.

R. C. Morales-Hernandez, J. G. Jaguey, and D. Becerra-Alonso, “A Comparison of Multi-Label Text Classification Models in Research Articles Labeled With Sustainable Development Goals,” IEEE Access, vol. 10, pp. 123534–123548, 2022, doi: 10.1109/ACCESS.2022.3223094.

X. Luo, “Efficient English text classification using selected Machine Learning Techniques,” Alexandria Engineering Journal, vol. 60, no. 3, pp. 3401–3409, Jun. 2021, doi: 10.1016/j.aej.2021.02.009.

N. Disayiram and R. A. H. M. Rupasingha, “A Comparative Study of Classifying English News Articles Using Machine Learning Algorithms,” in 2022 Trends in Electrical, Electronics, Computer Engineering Conference (TEECCON), IEEE, May 2022, pp. 50–55. doi: 10.1109/TEECCON54414.2022.9854832.

S. Aum and S. Choe, “srBERT: automatic article classification model for systematic review using BERT,” Syst Rev, vol. 10, no. 1, p. 285, Dec. 2021, doi: 10.1186/s13643-021-01763-w.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Oct. 2018.

G. Z. Nabiilah, I. N. Alam, E. S. Purwanto, and M. F. Hidayat, “Indonesian multilabel classification using IndoBERT embedding and MBERT classification,” International Journal of Electrical and Computer Engineering (IJECE), vol. 14, no. 1, p. 1071, Feb. 2024, doi: 10.11591/ijece.v14i1.pp1071-1078.

L. B. Hutama and D. Suhartono, “Indonesian Hoax News Classification with Multilingual Transformer Model and BERTopic,” Informatica, vol. 46, no. 8, Nov. 2022, doi: 10.31449/inf.v46i8.4336.

F. Indriani, R. A. Nugroho, M. R. Faisal, and D. Kartini, “Comparative Evaluation of IndoBERT, IndoBERTweet, and mBERT for Multilabel Student Feedback Classification,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 8, no. 6, pp. 748–757, Dec. 2024, doi: 10.29207/resti.v8i6.6100.

A. N. Azhar and M. L. Khodra, “Fine-tuning Pretrained Multilingual BERT Model for Indonesian Aspect-based Sentiment Analysis,” in 2020 7th International Conference on Advance Informatics: Concepts, Theory and Applications (ICAICTA), IEEE, Sep. 2020, pp. 1–6. doi: 10.1109/ICAICTA49861.2020.9428882.

C. A. Bahri and L. H. Suadaa, “Aspect-Based Sentiment Analysis in Bromo Tengger Semeru National Park Indonesia Based on Google Maps User Reviews,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 17, no. 1, p. 79, Feb. 2023, doi: 10.22146/ijccs.77354.

L. Pukelis, N. B. Puig, M. Skrynik, and V. Stanciauskas, “OSDG -- Open-Source Approach to Classify Text Data by UN Sustainable Development Goals (SDGs),” May 2020.

L. Pukelis, N. Bautista-Puig, G. Statulevičiūtė, V. Stančiauskas, G. Dikmener, and D. Akylbekova, “OSDG 2.0: a multilingual tool for classifying text data by UN Sustainable Development Goals (SDGs),” Nov. 2022.

Berliana Sugiarti Putri, Lya Hulliyyatus Suadaa, and Efri Diah Utami, “A Multilevel and Hierarchical Approach for Multilabel Classification Model in SDGs Research,” Jurnal Nasional Teknik Elektro dan Teknologi Informasi, vol. 14, no. 1, pp. 52–61, Feb. 2025, doi: 10.22146/jnteti.v14i1.16265.

P. Chapman et al., CRISP-DM 1.0 Step-by-step data mining guide. SPSS Inc, 2000.

Bappenas, “Peraturan Menteri Perencanaan Pembangunan Nasional/ Kepala Badan Perencanaan Pembangunan Nasional Republik Indonesia Nomor 7 Tahun 2018 Tentang Koordinasi, Perencanaan, Pemantauan, Evaluasi, Dan Pelaporan Pelaksanaan Tujuan Pembangunan Berkelanjutan,” 2018.

S. Wu and M. Dredze, “Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Stroudsburg, PA, USA: Association for Computational Linguistics, 2019, pp. 833–844. doi: 10.18653/v1/D19-1077.

T. Pires, E. Schlinger, and D. Garrette, “How Multilingual is Multilingual BERT?,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Stroudsburg, PA, USA: Association for Computational Linguistics, 2019, pp. 4996–5001. doi: 10.18653/v1/P19-1493.

S. Wu and M. Dredze, “Are All Languages Created Equal in Multilingual BERT?,” in Proceedings of the 5th Workshop on Representation Learning for NLP, Stroudsburg, PA, USA: Association for Computational Linguistics, 2020, pp. 120–130. doi: 10.18653/v1/2020.repl4nlp-1.16.

GARUDA, “Application Usage Manual of GARUDA: GARBA RUJUKAN DIGITAL,” 2022. Accessed: Jun. 18, 2025. [Online]. Available: https://drive.google.com/file/d/1QEpm6q5KVSp2SW_Ai5RZ9ZYah_8YVDUw/view

T. Fankhauser and S. Clematide, “SDG Classification Using Instruction-Tuned LLM,” in Proceedings of the 9th edition of the Swiss Text Analytics Conference, Chur, Switzerland: Association for Computational Linguistics, 2024, pp. 148–156.

W. Benjira, F. Atigui, B. Bucher, M. Grim-Yefsah, and N. Travers, “Automated mapping between SDG indicators and open data: An LLM-augmented knowledge graph approach,” Data Knowl Eng, vol. 156, p. 102405, Mar. 2025, doi: 10.1016/j.datak.2024.102405.

Automatic Classification of Multilanguage Scientific Papers to the Sustainable Development Goals Using Transfer Learning

Abstract

Downloads

References

Most read articles by the same author(s)