Penggunaan Bahasa Indonesia sebagai Pivot Language pada Mesin Penerjemah Madura-Sunda dengan Metode Transfer dan Triangulation

  • Herry Sujaini Universitas Tanjungpura
Keywords: statistical machine translation, pivot language, Indonesian, Madurese-Sundanese

Abstract

This paper is an attempt to focus on investigating the pivot (bridge) language technique, where the pivot language used to improve Statistical Machine Translation (SMT) quality. In this case, Indonesian is used as a pivot language, where each available corpus can be used to support the Madurese-Sundanese language pair. Experiments that have been carried out using the parallel corpus of the Indonesian-Madurese and Indonesian-Sundanese languages ​​are equal to 5K and 6K sentences respectively, while the monolingual corpus used Malay, Sundanese and Indonesian each at 10K, 10K and 100K sentences. This study compares the results of applying the Triangulation and Transfer methods using Indonesian as a pivot language. The results of the research proved that the Triangulation method has better acceleration when compared to the Transfer method. From the results of the experiments conducted, the Triangulation method increased the average Indonesian pivot-based SMT testing by 6.18% for Madura-Sundanese SMT and 7.27% for Madurese-Sundanese SMT.

Downloads

Download data is not yet available.

References

[1] Babych, B., Hartley, H., Sharoff, S., Mudraya. O., 2007. Assisting translators in indirect lexical transfer. Proceedings of ACL 2007: the 45th Annual Meeting of the Association of Computational Linguistics.
[2] Koehn, P., 2005. Europarl: A parallel corpus for statistical machine translation. Proceedings of AAMT: the 10th Machine Translation Summit. Phuket, Thailand, pages 79–86.
[3] Kholy, A.E., Habash, N., Leusch, G., Matusov, E., Sawaf, H., 2013. Language Independent Connectivity Strength Features for Phrase Pivot Statistical Machine Translation. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Sofia, Bulgaria, Agustus 2013.
[4] Matusov, E., Leusch, G., Banchs, R. E., Bertoldi, N., Dechelotte, D., Federico, M., . . . Ney, H. 2008. System Combination for Machine Translation of Spoken and Written Language. IEEE Transactions on Audio, Speech, and Language Processing, 16(7), 1222-1237.
[5] Utiyama, M., Isahara, H., 2007. A comparison of pivot methods for phrase-based statistical machinetranslation. Proceedings of HLT-NAACL 2007, the Human Language Technology Conference of the North American Chapters of the Association of Computational Linguistics. pages 484–491.
[6] Wu, H., Wang, H., 2007. Pivot language approach for phrase-based statistical machine translation. Proceedings of ACL: the 45th Annual Meeting of the Association of Computational Linguistics. pages 856–863.
[7] Babych, B., Hartley, A., Sharoff, S., 2008. Translating from under-resourced languages: Comparing direct transfer against pivot translation. Proceedings of MT Summit XI.
[8] Habash, N., Hu, J. 2009. Improving Arabic-Chinese statistical machine translation using English as pivot language. Proceedings of the Fourth Workshop on Statistical Machine Translation - StatMT 09.
[9] Somaye, B., Khadivi, S., Riahi, N., 2010. Farsi-german statistical machine translation through bridge language. Proceedings of IST, 5th International Symposium on Telecommunications.
[10] Al-Hunaity, M., Maegaard, B., Hansen, D. H., 2010. Using English as a Pivot Language to Enhance Danish-Arabic Statistical Machine Translation. In K. Choukri, O. Rambow, B. Maegaard, & I. Alkharashi (Eds.), Proceedings of the workshop in LR and HLT for Semitic Languages Valletta, Malta: European Language Resources Association.
[11] Nakov, P., Ng, H.T., 2012. Improving statistical machine translation for a resource-poor language using related resource-rich languages. JAIR: Journal of Artificial Intelligence Research pages 179–222.
[12] Paul, M., Finch, A., Sumita, E., 2013. How to Choose the Best Pivot Language for Automatic Translation of Low-Resource Languages. ACM Transactions on Asian Language Information Processing, 12(4), 1-17.
[13] Kunchukuttan, A., Pudupully, R., Chatterjee, R., Mishra, A., Tacharyya, P.B., 2014. The iit bombay smt system for icon2014 tools contest. Proceedings of ICON 2014, the Natural Language Processing Tools Contest.
[14] Dabre, R., Cromieres, F., Kurohashi, S., & Bhattacharyya, P., 2015. Leveraging Small Multilingual Corpora for SMT Using Many Pivot Languages. Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
[15] Suryani, A. A., Arieshanti, I., Yohanes, B. W., Subair, M., Budiwati, S. D., Rintyarna, B. S., 2016. Enriching English into Sundanese and Javanese translation list using pivot language. 2016 International Conference on Information & Communication Technology and Systems (ICTS).
[16] Ningtyas, D. W., Sujaini, H., & Safriadi, N., 2018. Penggunaan Pivot Language pada Mesin Penerjemah Statistik Bahasa Inggris ke Bahasa Melayu Sambas. Jurnal Edukasi Dan Penelitian Informatika (JEPIN), 4(2), 173.
[17] Berger, A.L., Pietra, S.D., Vincent, J., Pietra, D., 1996. Amaximumentropyapproach to natural language processing. Computational Linguistics 22(1):39–71.
[18] Stolcke, A., Zheng, J., Wang, W., Abrash, V., 2011. SRILM at sixteen: Update and outlook, Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop, Waikoloa.
[19] Och, F. J., Ney,H., 2003. A Systematic Comparison of Various Statistical Alignment Models, Computational Linguistics, 1(29), pp. 19-51, 2003.
[20] Koehn, P., 2010. Statistical machine translation, New York: Cambridge University Press.
[21] [Papineni, K., Roukos, S., Ward, T., Zhu, W.-J., 2002. BLEU: A Method For Automatic Evaluation of Machine Translation, Proceedings of the 40th Annual Meeting of the Association of Computational Linguistics (ACL), Pennsylvania.
Published
2019-08-02
Section
Artikel Teknologi Informasi