Penggunaan Bahasa Indonesia sebagai Pivot Language pada Mesin Penerjemah Madura-Sunda dengan Metode Transfer dan Triangulation
Abstract
This paper is an attempt to focus on investigating the pivot (bridge) language technique, where the pivot language used to improve Statistical Machine Translation (SMT) quality. In this case, Indonesian is used as a pivot language, where each available corpus can be used to support the Madurese-Sundanese language pair. Experiments that have been carried out using the parallel corpus of the Indonesian-Madurese and Indonesian-Sundanese languages are equal to 5K and 6K sentences respectively, while the monolingual corpus used Malay, Sundanese and Indonesian each at 10K, 10K and 100K sentences. This study compares the results of applying the Triangulation and Transfer methods using Indonesian as a pivot language. The results of the research proved that the Triangulation method has better acceleration when compared to the Transfer method. From the results of the experiments conducted, the Triangulation method increased the average Indonesian pivot-based SMT testing by 6.18% for Madura-Sundanese SMT and 7.27% for Madurese-Sundanese SMT.
Downloads
References
[2] Koehn, P., 2005. Europarl: A parallel corpus for statistical machine translation. Proceedings of AAMT: the 10th Machine Translation Summit. Phuket, Thailand, pages 79–86.
[3] Kholy, A.E., Habash, N., Leusch, G., Matusov, E., Sawaf, H., 2013. Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Sofia, Bulgaria, Agustus 2013.
[4] Matusov, E., Leusch, G., Banchs, R. E., Bertoldi, N., Dechelotte, D., Federico, M., . . . Ney, H. 2008. System Combination for Machine Translation of Spoken and Written Language. IEEE Transactions on Audio, Speech, and Language Processing, 16(7), 1222-1237.
[5] Utiyama, M., Isahara, H., 2007. A comparison of pivot methods for phrase-based statistical machinetranslation. Proceedings of HLT-NAACL 2007, the Human Language Technology Conference of the North American Chapters of the Association of Computational Linguistics. pages 484–491.
[6] Wu, H., Wang, H., 2007. Pivot language approach for phrase-based statistical machine translation. Proceedings of ACL: the 45th Annual Meeting of the Association of Computational Linguistics. pages 856–863.
[7] Babych, B., Hartley, A., Sharoff, S., 2008. Translating from under-resourced languages: Comparing direct transfer against pivot translation. Proceedings of MT Summit XI.
[8] Habash, N., Hu, J. 2009. Improving Arabic-Chinese statistical machine translation using English as pivot language. Proceedings of the Fourth Workshop on Statistical Machine Translation - StatMT 09.
[9] Somaye, B., Khadivi, S., Riahi, N., 2010. Farsi-german statistical machine translation through bridge language. Proceedings of IST, 5th International Symposium on Telecommunications.
[10] Al-Hunaity, M., Maegaard, B., Hansen, D. H., 2010. Using English as a Pivot Language to Enhance Danish-Arabic Statistical Machine Translation. In K. Choukri, O. Rambow, B. Maegaard, & I. Alkharashi (Eds.), Proceedings of the workshop in LR and HLT for Semitic Languages Valletta, Malta: European Language Resources Association.
[11] Nakov, P., Ng, H.T., 2012. Improving statistical machine translation for a resource-poor language using related resource-rich languages. JAIR: Journal of Artificial Intelligence Research pages 179–222.
[12] Paul, M., Finch, A., Sumita, E., 2013. How to Choose the Best Pivot Language for Automatic Translation of Low-Resource Languages. ACM Transactions on Asian Language Information Processing, 12(4), 1-17.
[13] Kunchukuttan, A., Pudupully, R., Chatterjee, R., Mishra, A., Tacharyya, P.B., 2014. The iit bombay smt system for icon2014 tools contest. Proceedings of ICON 2014, the Natural Language Processing Tools Contest.
[14] Dabre, R., Cromieres, F., Kurohashi, S., & Bhattacharyya, P., 2015. Leveraging Small Multilingual Corpora for SMT Using Many Pivot Languages. Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
[15] Suryani, A. A., Arieshanti, I., Yohanes, B. W., Subair, M., Budiwati, S. D., Rintyarna, B. S., 2016. Enriching English into Sundanese and Javanese translation list using pivot language. 2016 International Conference on Information & Communication Technology and Systems (ICTS).
[16] Ningtyas, D. W., Sujaini, H., & Safriadi, N., 2018. Penggunaan Pivot Language pada Mesin Penerjemah Statistik Bahasa Inggris ke Bahasa Melayu Sambas. Jurnal Edukasi Dan Penelitian Informatika (JEPIN), 4(2), 173.
[17] Berger, A.L., Pietra, S.D., Vincent, J., Pietra, D., 1996. Amaximumentropyapproach to natural language processing. Computational Linguistics 22(1):39–71.
[18] Stolcke, A., Zheng, J., Wang, W., Abrash, V., 2011. SRILM at sixteen: Update and outlook, Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop, Waikoloa.
[19] Och, F. J., Ney,H., 2003. A Systematic Comparison of Various Statistical Alignment Models, Computational Linguistics, 1(29), pp. 19-51, 2003.
[20] Koehn, P., 2010. Statistical machine translation, New York: Cambridge University Press.
[21] [Papineni, K., Roukos, S., Ward, T., Zhu, W.-J., 2002. BLEU: A Method For Automatic Evaluation of Machine Translation, Proceedings of the 40th Annual Meeting of the Association of Computational Linguistics (ACL), Pennsylvania.
Copyright (c) 2019 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;