Automatic Classification of Multilanguage Scientific Papers to the Sustainable Development Goals Using Transfer Learning
Abstract
The classification of scientific papers according to their relevance to Sustainable Development Goals (SDGs) is a critical task in identifying the research development status of goals. However, with the growing volume of scientific literature published worldwide in multiple languages, manual categorization of these papers has become increasingly complex and time-consuming. Furthermore, the need for a comprehensive multilingual dataset to train effective models complicates the task, as obtaining such datasets for various languages is resource intensive. This study proposes a solution to this problem by leveraging transfer learning techniques to automatically classify scientific papers into SDG labels. By fine-tuning pretrained multilingual models mBERT on SDG publication datasets in a multilabel approach, we demonstrate that transfer learning can significantly improve classification performance, even with limited labelled data, compared to SVM. Our approach enables the effective processing of scientific papers in different languages and facilitates the seamless mapping of research to the relevance of SDGs, the four pillars of SDGs, and the 17 goals of SDGs. The proposed method addresses the scalability issue in SDG classification and lays the groundwork for more efficient systems that can handle the multilingual nature of modern scientific publications.
Downloads
References
United Nations Development Programme (UNDP), “What are the Sustainable Development Goals?” Accessed: Jun. 18, 2025. [Online]. Available: https://www.undp.org/sustainable-development-goals
P. Berrone, H. E. Rousseau, J. E. Ricart, E. Brito, and A. Giuliodori, “How can research contribute to the implementation of sustainable development goals? An interpretive review of SDG literature in management,” International Journal of Management Reviews, vol. 25, no. 2, pp. 318–339, Apr. 2023, doi: 10.1111/ijmr.12331.
S. Sorooshian, “The sustainable development goals of the United Nations: A comparative midterm research review,” J Clean Prod, vol. 453, p. 142272, May 2024, doi: 10.1016/j.jclepro.2024.142272.
M. Mishra et al., “A bibliometric analysis of sustainable development goals (SDGs): a review of progress, challenges, and opportunities,” Environ Dev Sustain, vol. 26, no. 5, pp. 11101–11143, May 2023, doi: 10.1007/s10668-023-03225-w.
F. Indana and R. W. Pahlevi, “A bibliometric approach to Sustainable Development Goals (SDGs) systematic analysis,” Cogent Business & Management, vol. 10, no. 2, Dec. 2023, doi: 10.1080/23311975.2023.2224174.
N. V. Diniz, D. R. Cunha, M. de Santana Porte, C. B. M. Oliveira, and F. de Freitas Fernandes, “A bibliometric analysis of sustainable development goals in the maritime industry and port sector,” Reg Stud Mar Sci, vol. 69, p. 103319, Jan. 2024, doi: 10.1016/j.rsma.2023.103319.
F. Invernici, F. Curati, J. Jakimov, A. Samavi, and A. Bernasconi, “Capturing research literature attitude towards Sustainable Development Goals: an LLM-based topic modeling approach,” Nov. 2024, doi: 10.1186/s40537-025-01189-4.
F. Illia, R. Nooraeni, and L. H. Suadaa, “Implementation of Topic Modeling in the Analysis of Topic Trends in SDGs Goal 6 Research,” in 2023 International Conference on Electrical Engineering and Informatics (ICEEI), IEEE, Oct. 2023, pp. 1–6. doi: 10.1109/ICEEI59426.2023.10346917.
A. Hajikhani and A. Suominen, “Mapping the sustainable development goals (SDGs) in science, technology and innovation: application of machine learning in SDG-oriented artefact detection,” Scientometrics, vol. 127, no. 11, pp. 6661–6693, Nov. 2022, doi: 10.1007/s11192-022-04358-x.
R. C. Morales-Hernandez, J. G. Jaguey, and D. Becerra-Alonso, “A Comparison of Multi-Label Text Classification Models in Research Articles Labeled With Sustainable Development Goals,” IEEE Access, vol. 10, pp. 123534–123548, 2022, doi: 10.1109/ACCESS.2022.3223094.
X. Luo, “Efficient English text classification using selected Machine Learning Techniques,” Alexandria Engineering Journal, vol. 60, no. 3, pp. 3401–3409, Jun. 2021, doi: 10.1016/j.aej.2021.02.009.
N. Disayiram and R. A. H. M. Rupasingha, “A Comparative Study of Classifying English News Articles Using Machine Learning Algorithms,” in 2022 Trends in Electrical, Electronics, Computer Engineering Conference (TEECCON), IEEE, May 2022, pp. 50–55. doi: 10.1109/TEECCON54414.2022.9854832.
S. Aum and S. Choe, “srBERT: automatic article classification model for systematic review using BERT,” Syst Rev, vol. 10, no. 1, p. 285, Dec. 2021, doi: 10.1186/s13643-021-01763-w.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Oct. 2018.
G. Z. Nabiilah, I. N. Alam, E. S. Purwanto, and M. F. Hidayat, “Indonesian multilabel classification using IndoBERT embedding and MBERT classification,” International Journal of Electrical and Computer Engineering (IJECE), vol. 14, no. 1, p. 1071, Feb. 2024, doi: 10.11591/ijece.v14i1.pp1071-1078.
L. B. Hutama and D. Suhartono, “Indonesian Hoax News Classification with Multilingual Transformer Model and BERTopic,” Informatica, vol. 46, no. 8, Nov. 2022, doi: 10.31449/inf.v46i8.4336.
F. Indriani, R. A. Nugroho, M. R. Faisal, and D. Kartini, “Comparative Evaluation of IndoBERT, IndoBERTweet, and mBERT for Multilabel Student Feedback Classification,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 8, no. 6, pp. 748–757, Dec. 2024, doi: 10.29207/resti.v8i6.6100.
A. N. Azhar and M. L. Khodra, “Fine-tuning Pretrained Multilingual BERT Model for Indonesian Aspect-based Sentiment Analysis,” in 2020 7th International Conference on Advance Informatics: Concepts, Theory and Applications (ICAICTA), IEEE, Sep. 2020, pp. 1–6. doi: 10.1109/ICAICTA49861.2020.9428882.
C. A. Bahri and L. H. Suadaa, “Aspect-Based Sentiment Analysis in Bromo Tengger Semeru National Park Indonesia Based on Google Maps User Reviews,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 17, no. 1, p. 79, Feb. 2023, doi: 10.22146/ijccs.77354.
L. Pukelis, N. B. Puig, M. Skrynik, and V. Stanciauskas, “OSDG -- Open-Source Approach to Classify Text Data by UN Sustainable Development Goals (SDGs),” May 2020.
L. Pukelis, N. Bautista-Puig, G. Statulevičiūtė, V. Stančiauskas, G. Dikmener, and D. Akylbekova, “OSDG 2.0: a multilingual tool for classifying text data by UN Sustainable Development Goals (SDGs),” Nov. 2022.
Berliana Sugiarti Putri, Lya Hulliyyatus Suadaa, and Efri Diah Utami, “A Multilevel and Hierarchical Approach for Multilabel Classification Model in SDGs Research,” Jurnal Nasional Teknik Elektro dan Teknologi Informasi, vol. 14, no. 1, pp. 52–61, Feb. 2025, doi: 10.22146/jnteti.v14i1.16265.
P. Chapman et al., CRISP-DM 1.0 Step-by-step data mining guide. SPSS Inc, 2000.
Bappenas, “Peraturan Menteri Perencanaan Pembangunan Nasional/ Kepala Badan Perencanaan Pembangunan Nasional Republik Indonesia Nomor 7 Tahun 2018 Tentang Koordinasi, Perencanaan, Pemantauan, Evaluasi, Dan Pelaporan Pelaksanaan Tujuan Pembangunan Berkelanjutan,” 2018.
S. Wu and M. Dredze, “Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Stroudsburg, PA, USA: Association for Computational Linguistics, 2019, pp. 833–844. doi: 10.18653/v1/D19-1077.
T. Pires, E. Schlinger, and D. Garrette, “How Multilingual is Multilingual BERT?,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Stroudsburg, PA, USA: Association for Computational Linguistics, 2019, pp. 4996–5001. doi: 10.18653/v1/P19-1493.
S. Wu and M. Dredze, “Are All Languages Created Equal in Multilingual BERT?,” in Proceedings of the 5th Workshop on Representation Learning for NLP, Stroudsburg, PA, USA: Association for Computational Linguistics, 2020, pp. 120–130. doi: 10.18653/v1/2020.repl4nlp-1.16.
GARUDA, “Application Usage Manual of GARUDA: GARBA RUJUKAN DIGITAL,” 2022. Accessed: Jun. 18, 2025. [Online]. Available: https://drive.google.com/file/d/1QEpm6q5KVSp2SW_Ai5RZ9ZYah_8YVDUw/view
T. Fankhauser and S. Clematide, “SDG Classification Using Instruction-Tuned LLM,” in Proceedings of the 9th edition of the Swiss Text Analytics Conference, Chur, Switzerland: Association for Computational Linguistics, 2024, pp. 148–156.
W. Benjira, F. Atigui, B. Bucher, M. Grim-Yefsah, and N. Travers, “Automated mapping between SDG indicators and open data: An LLM-augmented knowledge graph approach,” Data Knowl Eng, vol. 156, p. 102405, Mar. 2025, doi: 10.1016/j.datak.2024.102405.
Copyright (c) 2025 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;