Evaluasi Topik Tersembunyi Berdasarkan Aspect Extraction menggunakan Pengembangan Latent Dirichlet Allocation
Abstract
Recently, Sentiment Analysis is used for expression detection of products or services. Sentiment Analysis is one category type with a level of aspect focused on extracting product aspects. One of the common methods used for aspect extraction is Latent Dirichlet Allocation (LDA) using random topic identification, but this method has not been able to find an acceptable topic with some aspects having been found. Undeterminable topics are referred to as the hidden topics. This study purpose is to evaluate and compare the suitability of identifying hidden topics between human and computer evaluation. The study is also focused on aspect extraction using a variety of LDA innovations. The data used in this study used case studies on e-Commerce. Data were processed using feature selection and grouped using LDA development. Then the data results are processed using Latent Topic Identification based on subjective and objective evaluations. The identification of hidden topic results was evaluated using several semantic and lexicon tests. The evaluation results indicate the comparison of two hidden topic identification assessment values is quite relevant with the average difference in value reaching 6%. As a result, computer calculations assist humans in determining topics if each topic has a low coherence value.
Downloads
References
S. Poria, E. Cambria, and A. Gelbukh, “Aspect extraction for opinion mining with a deep convolutional neural network,” Knowledge-Based Syst., vol. 108, pp. 42–49, Sep. 2016, doi: 10.1016/j.knosys.2016.06.009.
M. Tubishat, N. Idris, and M. Abushariah, “Explicit aspects extraction in sentiment analysis using optimal rules combination,” Futur. Gener. Comput. Syst., vol. 114, pp. 448–480, Jan. 2021, doi: 10.1016/j.future.2020.08.019.
B. Liu, Sentiment analysis : mining opinions, sentiments, and emotions. New York: Cambridge University Press, 2015.
T. A. Rana and Y.-N. Cheah, “Aspect extraction in sentiment analysis: comparative analysis and survey,” Artif. Intell. Rev., vol. 46, no. 4, pp. 459–483, Dec. 2016, doi: 10.1007/s10462-016-9472-z.
A. S. Manek, P. D. Shenoy, M. C. Mohan, and V. K. R, “Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier,” World Wide Web, vol. 20, no. 2, pp. 135–154, Mar. 2017, doi: 10.1007/s11280-015-0381-x.
M. Shams and A. Baraani-dastjerdi, “Enriched LDA (ELDA): combination of latent Dirichlet allocation with word co-occurrence analysis for aspect extraction,” Expert Syst. Appl., vol. 80, pp. 136–146, 2017, doi: 10.1016/j.eswa.2017.02.038.
E. Cambria, D. Das, S. Bandyopadhyay, and A. Feraco, Eds., A Practical Guide to Sentiment Analysis, vol. 5. Cham: Springer International Publishing, 2017.
L. Chen, J. Martineau, D. Cheng, and A. Sheth, “Clustering for Simultaneous Extraction of Aspects and Features from Reviews,” in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016, pp. 789–799, doi: 10.18653/v1/N16-1093.
C. Wu, F. Wu, S. Wu, Z. Yuan, and Y. Huang, “A hybrid unsupervised method for aspect term and opinion target extraction,” Knowledge-Based Syst., vol. 148, pp. 66–73, 2018, doi: 10.1016/j.knosys.2018.01.019.
Y. Rubtsova and S. Koshelnikov, “Aspect Extraction from Reviews Using Conditional Random Fields,” in Knowledge Engineering and Semantic Web, 2015, pp. 158–167, doi: 10.1007/978-3-319-24543-0.
Y. Yang, C. Chen, and F. S. Bao, “Aspect Extraction from Product Reviews Using Category Hierarchy Information,” in Proceedings ofthe 15th Conference ofthe European Chapter ofthe Association for Computational Linguistics, 2017, vol. 2, pp. 675–680.
X. Yan, J. Guo, Y. Lan, and X. Cheng, “A Biterm Topic Model for Short Texts,” in International World Wide Web Conference Committee, 2013, pp. 1445–1455, doi: 10.1145/2488388.2488514.
D.-H. Pham and A.-C. Le, “Exploiting multiple word embeddings and one-hot character vectors for aspect-based sentiment analysis,” Int. J. Approx. Reason., vol. 103, pp. 1–10, Dec. 2018, doi: 10.1016/j.ijar.2018.08.003.
R. He, W. S. Lee, H. T. Ng, and D. Dahlmeier, “An Unsupervised Neural Attention Model for Aspect Extraction,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017, pp. 388–397.
D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, 2003.
H. Jelodar et al., “Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey,” Multimed. Tools Appl., vol. 78, no. 11, pp. 15169–15211, Jun. 2019, doi: 10.1007/s11042-018-6894-4.
K. Stevens, P. Kegelmeyer, D. Andrzejewski, and D. Buttler, “Exploring Topic Coherence over many models and many topics,” in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012, no. July, pp. 952–961.
N. Aletras and M. Stevenson, “Evaluating Topic Coherence Using Distributional Semantics,” in Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers, 2013, pp. 13–22.
D. Nolasco and J. Oliveira, “Mining social influence in science and vice-versa: A topic correlation approach,” Int. J. Inf. Manage., vol. 51, p. 102017, Apr. 2020, doi: 10.1016/j.ijinfomgt.2019.10.002.
J. Cai, J. Luo, S. Wang, and S. Yang, “Feature selection in machine learning: A new perspective,” Neurocomputing, vol. 300, pp. 70–79, Jul. 2018, doi: 10.1016/j.neucom.2017.11.077.
A. M. Priyatno, M. M. Muttaqi, F. Syuhada, and A. Z. Arifin, “Deteksi Bot Spammer Twitter Berbasis Time Interval Entropy dan Global Vectors for Word Representations Tweet’s Hashtag,” Regist. J. Ilm. Teknol. Sist. Inf., vol. 5, no. 1, p. 37, Jan. 2019, doi: 10.26594/register.v5i1.1382.
A. Panchenko et al., “A Graph-Based Approach to Skill Extraction from Text,” in Proceedings of TextGraphs-8 Graph-based Methods for Natural Language Processing, 2013, pp. 79–87.
Copyright (c) 2021 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;