Sentiment Classification for Film Reviews by Reducing Additional Introduced Sentiment Bias
Abstract
Film business and its individual reviews cannot be separated and film review sites such as IMDb is a credible source of reviews posted in public forums. With IMDb site reviews being unstructured and bias-heavy, classification methods by reducing additional sentiment bias is needed to create a balanced classification with lower polarity bias. Elimination of additional sentiment bias will improve the model as polarity is defined by non-bias method, resulting in models correctly defined which sequences of words is either positive or negative. This research limits the dataset by 50.000 rows of randomly extracted reviews from the IMDb website using dataset preparation methods such as Preprocessing, POS-Tagging, and Word Embeddings. Then preprocessed data is used in classification methods such as ANN, SWN, and SO-Cal. This paper also used bias processing methods such as Hyperparameter Tuning and BPM, with outputs evaluated using Accuracy and PBR metrics. This research yields 77.39 % for ANN, 66.32% for BPM, 75.6% for SO-Cal, and 76.26% for Hybrid classification. Best PBR resulted in two lexicon-based methods on 0.0009 for BPM, and 0.00006 for SO-Cal. More advanced model configuration in ANN can improve the model, and much complex lexicon models will be a future in the research topic.
Downloads
References
References
N. V R, “Predicting movie success based on imdb data,” International Journal for Research in Applied Science and Engineering Technology, vol. V, no. X, pp. 504–507, 2017.
Y.-L. Chiu, K.-H. Chen, J.-N. Wang, and Y.-T. Hsu, “The impact of online movie word-of-mouth on consumer choice,” International Marketing Review, vol. 36, no. 6, pp. 996–1025, 2019.
K. Kumar, B. S. Harish, and H. K. Darshan, “Sentiment analysis on imdb movie reviews using hybrid feature extraction method,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 5, p. 109, 2019.
H. Han, Y. Zhang, J. Zhang, J. Yang, and X. Zou, “Improving the performance of lexicon-based review sentiment analysis method by reducing additional introduced sentiment bias,” PLOS ONE, vol. 13, no. 8, 2018.
V. Despotovic and D. Tanikic, “Sentiment analysis of microblogs using multilayer feed-forward artificial neural networks,” Computing and Informatics, vol. 36, no. 5, pp. 1127–1142, 2017.
K. Z. Aung and N. N. Myo, “Sentiment analysis of students' comment using lexicon based approach,” 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS), 2017.
H. Ha, H. Han, S. Mun, S. Bae, J. Lee, and K. Lee, “An improved study of multilevel semantic network visualization for analyzing sentiment word of movie review data,” Applied Sciences, vol. 9, no. 12, p. 2419, 2019.
N. Pradhan, G. Rani, V. S. Dhaka, and R. C. Poonia, “Diabetes prediction using artificial neural network,” Deep Learning Techniques for Biomedical and Health Informatics, pp. 327–339, 2020.
M. Iqbal, A. Karim, and F. Kamiran, “Bias-aware lexicon-based sentiment analysis,” Proceedings of the 30th Annual ACM Symposium on Applied Computing, 2015.
A. Hasan, S. Moin, A. Karim, and S. Shamshirband, “Machine learning-based sentiment analysis for twitter accounts,” Mathematical and Computational Applications, vol. 23, no. 1, p. 11, 2018.
Y. Ho and S. Wookey, “The real-world-weight cross-entropy loss function: Modeling the costs of mislabeling,” IEEE Access, vol. 8, pp. 4806–4813, 2020.
A. Dahou, M. A. Elaziz, J. Zhou, and S. Xiong, “Arabic sentiment classification using convolutional neural network and differential evolution algorithm,” Computational Intelligence and Neuroscience, vol. 2019, pp. 1–16, 2019.
A. Tripathy, A. Anand, and S. K. Rath, “Document-level sentiment classification using hybrid machine learning approach,” Knowledge and Information Systems, vol. 53, no. 3, pp. 805–831, 2017.
Z. Shaukat, A. Zulfiqar, C. Xiao, M. Azeem and T. Mahmood, "Sentiment analysis on IMDB using lexicon and neural networks", SN Applied Sciences, vol. 2, no. 2, 2020.
Ö. Ican and T. B. Çelik, “Stock market prediction performance of neural networks: A literature review,” International Journal of Economics and Finance, vol. 9, no. 11, p. 100, 2017.
C. Nwankpa, S. Eze, W. Ijomah, A. Gachagan, and S. Marshall, “Achieving remanufacturing inspection using deep learning,” Journal of Remanufacturing, vol. 11, no. 2, pp. 89–105, 2020.
I. Kandel and M. Castelli, “The effect of batch size on the generalizability of the convolutional neural networks on a histopathology dataset,” ICT Express, vol. 6, no. 4, pp. 312–315, 2020.
M. Feurer and F. Hutter, “Hyperparameter optimization,” Automated Machine Learning, pp. 3–33, 2019.
J. Heaton, “Ian Goodfellow, Yoshua Bengio, and aaron Courville: Deep learning,” Genetic Programming and Evolvable Machines, vol. 19, no. 1-2, pp. 305–307, 2017.
L. Shi and B. Chen, “A vector representation of dna sequences using locality sensitive hashing,” 2019.
X. Song, G. Wang, Y. Huang, Z. Wu, D. Su, and H. Meng, “Speech-xlnet: Unsupervised acoustic model pretraining for self-attention networks,” Interspeech 2020, 2020.
J. Pennington, R. Socher, and C. Manning, “Glove: Global vectors for word representation,” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
T. Thongtan and T. Phienthrakul, “Sentiment classification using document embeddings trained with cosine similarity,” Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, 2019.
A. Singh, “Review of ‘text classification algorithms: A survey,’” 2021.
A. S. Shafie, N. M. Sharef, M. A. Azmi Murad, and A. Azman, “Aspect extraction performance with pos tag pattern of dependency relation in aspect-based sentiment analysis,” 2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP), 2018.
Copyright (c) 2021 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;