Minangkabau Language Stemming: A New Approach with Modified Enhanced Confix Stripping

  • Fadhli Almu'iini Ahda Institut Teknologi dan Bisnis Asia Malang
  • Aji Prasetya Wibawa Universitas Negeri Malang
  • Didik Dwi Prasetya Universitas Negeri Malang
  • Danang Arbian Sulistyo Universitas Negeri Malang
  • Andrew Nafalski University of South Australia
Keywords: enhanced confix stripping, minangkabau language, morphological, natural language processing, stemming

Abstract

Stemming is an essential procedure in natural language processing (NLP), which involves reducing words to their root forms by eliminating affixes, including prefixes, infixes, and suffixes. The employed method assesses the efficacy of stemming, which differs according to language. Complex affixation patterns in Indonesian and regional languages such as Minangkabau pose considerable difficulties for traditional algorithms. This research adopts the enhanced fixed-stripping method to tackle these issues by integrating linguistic characteristics unique to Minangkabau. This study has three phases: data acquisition, pseudocode development, and algorithm execution. Testing revealed an average accuracy of 77.8%, indicating the algorithm's proficiency in managing Minangkabau’s intricate morphology. Nevertheless, constraints persist, particularly with irregular affixation patterns. Possible improvements could include adding more datasets, improving the rules for handling affixes, and using machine learning to make the system more flexible and accurate. This study emphasizes the significance of customized solutions for regional languages and provides insights into the advancement of NLP in various linguistic environments. The findings underscore the progress made in processing Minangkabau text while also emphasizing the need for further research to address current issues.

Downloads

Download data is not yet available.

References

Z. Abidin, A. Junaidi, and Wamiliana, “Text Stemming and Lemmatization of Regional Languages in Indonesia: A Systematic Literature Review,” J. Inf. Syst. Eng. Bus. Intell., vol. 10, no. 2, Art. no. 2, Jun. 2024, doi: 10.20473/jisebi.10.2.217-231.

A. Arif siswandi, Y. Permana, and A. Emarilis, “Stemming Analysis Indonesian Language News Text with Porter Algorithm,” J. Phys. Conf. Ser., vol. 1845, no. 1, p. 012019, Mar. 2021, doi: 10.1088/1742-6596/1845/1/012019.

M. E. Polus and T. Abbas, “Development for Performance of Porter Stemmer Algorithm,” Feb. 26, 2021, Social Science Research Network, Rochester, NY: 3801021. Accessed: Mar. 24, 2025. [Online]. Available: https://papers.ssrn.com/abstract=3801021

A. Jabbar, S. Iqbal, M. I. Tamimy, A. Rehman, S. A. Bahaj, and T. Saba, “An Analytical Analysis of Text Stemming Methodologies in Information Retrieval and Natural Language Processing Systems,” IEEE Access, vol. 11, pp. 133681–133702, 2023, doi: 10.1109/ACCESS.2023.3332710.

M. Alyousf and M. F. Alhalabi, “A Survey of Document Stemming Algorithms in Information Retrieval Systems,” ACM Trans Asian Low-Resour Lang Inf Process, vol. 24, no. 4, p. 36:1-36:28, Mar. 2025, doi: 10.1145/3715120.

S. Memon, G. Ali, K. N. M. -, A. Shaikh, S. K.Aasoori, and F. Ul, “Comparative Study of Truncating and Statistical Stemming Algorithms,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 2, 2020, doi: 10.14569/IJACSA.2020.0110272.

Rianto, A. B. Mutiara, E. P. Wibowo, and P. I. Santosa, “Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation,” J. Big Data, vol. 8, no. 1, p. 26, Jan. 2021, doi: 10.1186/s40537-021-00413-1.

H. Dwiharyono and S. Suyanto, “Stemming for Better Indonesian Text-to-Phoneme,” Ampersand, vol. 9, p. 100083, Jan. 2022, doi: 10.1016/j.amper.2022.100083.

H. A. Almuzaini and A. M. Azmi, “Impact of Stemming and Word Embedding on Deep Learning-Based Arabic Text Categorization,” IEEE Access, vol. 8, pp. 127913–127928, 2020, doi: 10.1109/ACCESS.2020.3009217.

J. Atwan, M. Wedyan, Q. Bsoul, A. Hammadeen, and R. Alturki, “The Use of Stemming in the Arabic Text and Its Impact on the Accuracy of Classification,” Sci. Program., vol. 2021, no. 1, p. 1367210, 2021, doi: 10.1155/2021/1367210.

D. N. de Oliveira and L. H. de C. Merschmann, “Joint evaluation of preprocessing tasks with classifiers for sentiment analysis in Brazilian Portuguese language,” Multimed. Tools Appl., vol. 80, no. 10, pp. 15391–15412, Apr. 2021, doi: 10.1007/s11042-020-10323-8.

Department of Computer Science, Faculty of Mathematics and Informatics, University Mohamed Boudiaf of M’sila, M’sila, Algeria, S. Gadri, E. Neuhold, and Department of Computer Science, Faculty of Computer Science, University of Vienna, Vienna, Austria, “Developing a Multilingual Stemmer for the Requirement of Text Categorization and Information Retrieval,” Int. J. Electr. Eng. Inform., vol. 14, no. 2, pp. 291–310, Jun. 2022, doi: 10.15676/ijeei.2022.14.2.3.

D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural language processing: state of the art, current trends and challenges,” Multimed. Tools Appl., vol. 82, no. 3, pp. 3713–3744, Jan. 2023, doi: 10.1007/s11042-022-13428-4.

C. Nnaji and A. A. Karakhan, “Technologies for safety and health management in construction: Current use, implementation benefits and limitations, and adoption barriers,” J. Build. Eng., vol. 29, p. 101212, May 2020, doi: 10.1016/j.jobe.2020.101212.

B. Priya Kamath et al., “Comprehensive Analysis of Word Embedding Models and Design of Effective Feature Vector for Classification of Amazon Product Reviews,” IEEE Access, vol. 13, pp. 25239–25255, 2025, doi: 10.1109/ACCESS.2025.3536631.

Rianto, A. B. Mutiara, E. P. Wibowo, and P. I. Santosa, “Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation,” J. Big Data, vol. 8, no. 1, p. 26, Jan. 2021, doi: 10.1186/s40537-021-00413-1.

A. Jabbar, S. Iqbal, M. I. Tamimy, A. Rehman, S. A. Bahaj, and T. Saba, “An Analytical Analysis of Text Stemming Methodologies in Information Retrieval and Natural Language Processing Systems,” IEEE Access, vol. 11, pp. 133681–133702, 2023, doi: 10.1109/ACCESS.2023.3332710.

C. Wu et al., “Natural language processing for smart construction: Current status and future directions,” Autom. Constr., vol. 134, p. 104059, Feb. 2022, doi: 10.1016/j.autcon.2021.104059.

S. Kulkarni and S. F. Rodd, “Context Aware Recommendation Systems: A review of the state of the art techniques,” Comput. Sci. Rev., vol. 37, p. 100255, Aug. 2020, doi: 10.1016/j.cosrev.2020.100255.

A. A. Afifi and F. Yufriadi, “THE COEXISTENCE OF KAUM MUDO AND KAUM TUO: THE TRANSFORMATION OF ISLAMIC EDUCATION IN MINANGKABAU,” 2024.

H. H. Park, K. J. Zhang, C. Haley, K. Steimel, H. Liu, and L. Schwartz, “Morphology Matters: A Multilingual Language Modeling Analysis”, doi: 10.1162/tacl_a_00365.

S. Santuso and S. Sukarno, “Contrastive analysis of form and meaning of reduplication in Madurese and Minangkabau language,” J. Lang. Lit. Soc. Cult. Stud., vol. 3, no. 1, Art. no. 1, Mar. 2025, doi: 10.58881/jllscs.v3i1.276.

S. Jatmika, S. Patmanthara, A. P. Wibawa, and F. Kurniawan, “Cognition-Based Document Matching Within the Chatbot Modeling Framework,” J. Appl. Data Sci., vol. 5, no. 2, Art. no. 2, May 2024, doi: 10.47738/jads.v5i2.209.

S. Supriyono, A. P. Wibawa, S. Suyono, and F. Kurniawan, “Analyzing Audience Sentiments in Digital Comedy: A Study of YouTube Comments Using LSTM Models,” J. Appl. Data Sci., vol. 5, no. 4, Art. no. 4, Oct. 2024, doi: 10.47738/jads.v5i4.393.

D. W. Otter, J. R. Medina, and J. K. Kalita, “A Survey of the Usages of Deep Learning for Natural Language Processing,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 2, pp. 604–624, Feb. 2021, doi: 10.1109/TNNLS.2020.2979670.

H. Handoko, “Developing the Corpus of Minangkabau Language: Insights, Challenges, and Future Directions,” J. ARBITRER, vol. 11, no. 3, Art. no. 3, Sep. 2024, doi: 10.25077/ar.11.3.413-429.2024.

Z. Abidin, A. Junaidi, and Wamiliana, “Text Stemming and Lemmatization of Regional Languages in Indonesia: A Systematic Literature Review,” J. Inf. Syst. Eng. Bus. Intell., vol. 10, no. 2, Art. no. 2, Jun. 2024, doi: 10.20473/jisebi.10.2.217-231.

N. Pittaras, G. Giannakopoulos, G. Papadakis, and V. Karkaletsis, “Text classification with semantically enriched word embeddings,” Nat. Lang. Eng., vol. 27, no. 4, pp. 391–425, Jul. 2021, doi: 10.1017/S1351324920000170.

Z. Abidin, A. Junaidi, and Wamiliana, “Text Stemming and Lemmatization of Regional Languages in Indonesia: A Systematic Literature Review,” J. Inf. Syst. Eng. Bus. Intell., vol. 10, no. 2, Art. no. 2, Jun. 2024, doi: 10.20473/jisebi.10.2.217-231.

V. P. Carolina, E. Utami, and A. Yaqin, “Exploring Stemming Techniques in Ambon Malay Languages: A Systematic Literature Review,” Jambura J. Inform., vol. 6, no. 1, Art. no. 1, May 2024, doi: 10.37905/jji.v6i1.24954.

A. Mardatillah, “The enterprise culture heritage of Minangkabau cuisine, West Sumatra of Indonesia as a source of sustainable competitive advantage,” J. Ethn. Foods, vol. 7, no. 1, p. 34, Sep. 2020, doi: 10.1186/s42779-020-00059-z.

M. Irhamni and K. Nasution, “Prefix in Grammatical Meaning of Minang Kabau Language,” J. Ilm. Wahana Pendidik., vol. 10, no. 12, Art. no. 12, Jun. 2024, doi: 10.5281/zenodo.12542283.

S. I. Melia, J. Sholihah, D. Nisak, I. S. Juniaristha, and A. T. Ni’mah, “The Ngoko Javanese Stemmer uses the Enhanced Confix Stripping Stemmer Method,” Rekayasa, vol. 16, no. 1, Art. no. 1, Apr. 2023, doi: 10.21107/rekayasa.v16i1.19308.

N. W. Wardani and P. G. S. C. Nugraha, “Stemming Teks Bahasa Bali dengan Algoritma Enhanced Confix Stripping,” Int. J. Nat. Sci. Eng., vol. 4, no. 3, Art. no. 3, Dec. 2020, doi: 10.23887/ijnse.v4i3.30309.

V. P. Carolina, E. Utami, and A. Yaqin, “Exploring Stemming Techniques in Ambon Malay Languages: A Systematic Literature Review,” Jambura J. Inform., vol. 6, no. 1, Art. no. 1, May 2024, doi: 10.37905/jji.v6i1.24954.

D. Soyusiawaty, A. H. S. Jones, and N. L. Lestariw, “The Stemming Application on Affixed Javanese Words by using Nazief and Adriani Algorithm,” IOP Conf. Ser. Mater. Sci. Eng., vol. 771, no. 1, p. 012026, Mar. 2020, doi: 10.1088/1757-899X/771/1/012026.

D. A. Sulistyo, A. P. Wibawa, D. D. Prasetya, and F. A. Ahda, “LSTM-Based Machine Translation for Madurese-Indonesian,” J. Appl. Data Sci., vol. 4, no. 3, Art. no. 3, Sep. 2023, doi: 10.47738/jads.v4i3.113.

D. A. Sulistyo, A. P. Wibawa, D. D. Prasetya, and F. A. Ahda, “An enhanced pivot-based neural machine translation for low-resource languages,” Int. J. Adv. Intell. Inform., vol. 11, no. 2, Art. no. 2, May 2025, doi: 10.26555/ijain.v11i2.2115.

F. A. Ahda, A. P. Wibawa, D. D. Prasetya, and D. A. Sulistyo, “Comparison of Adam Optimization and RMS prop in Minangkabau-Indonesian Bidirectional Translation with Neural Machine Translation,” JOIV Int. J. Inform. Vis., vol. 8, no. 1, pp. 231–238, Mar. 2024, doi: 10.62527/joiv.8.1.1818.

H. Handoko, “Developing the Corpus of Minangkabau Language: Insights, Challenges, and Future Directions,” J. ARBITRER, vol. 11, no. 3, pp. 413–429, Sep. 2024, doi: 10.25077/ar.11.3.413-429.2024.

S. Z. Fazekas, R. Mercaş, and D. Reidenbach, “On the Prefix–Suffix Duplication Reduction,” Int. J. Found. Comput. Sci., vol. 31, no. 01, pp. 91–102, Jan. 2020, doi: 10.1142/S0129054120400067.

D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural language processing: state of the art, current trends and challenges,” Multimed. Tools Appl., vol. 82, no. 3, pp. 3713–3744, Jan. 2023, doi: 10.1007/s11042-022-13428-4.

A. Theissler, M. Thomas, M. Burch, and F. Gerschner, “ConfusionVis: Comparative evaluation and selection of multi-class classifiers based on confusion matrices,” Knowl.-Based Syst., vol. 247, p. 108651, Jul. 2022, doi: 10.1016/j.knosys.2022.108651.

L. Vergni, F. Todisco, and B. Di Lena, “Evaluation of the similarity between drought indices by correlation analysis and Cohen’s Kappa test in a Mediterranean area,” Nat. Hazards, vol. 108, no. 2, pp. 2187–2209, Sep. 2021, doi: 10.1007/s11069-021-04775-w.

Published
2025-06-22
How to Cite
Ahda, F. A., Aji Prasetya Wibawa, Didik Dwi Prasetya, Danang Arbian Sulistyo, & Andrew Nafalski. (2025). Minangkabau Language Stemming: A New Approach with Modified Enhanced Confix Stripping. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 9(3), 593 - 603. https://doi.org/10.29207/resti.v9i3.6511
Section
Artificial Intelligence