Solution to Scalability and Sparsity Problems in Collaborative Filtering using K-Means Clustering and Weight Point Rank (WP-Rank)
Abstract
Collaborative filtering is a method that can be used in recommendation systems. Collaborative Filtering works by analyzing rating data patterns. It is also used to make predictions of interest to users. This process begins with collecting data and analyzing large amounts of information on the behavior, activities, and tendencies of users. The results of the analysis are used to predict what users like based on similarities with other users. In addition, collaborative filtering is able to produce recommendations of better quality than recommendation systems based on content and demographics. However, collaborative filtering still faces scalability and sparsity problems. It are because the data is always evolving so that it becomes big data, besides that there are many data with incomplete conditions or many vacancies are found. Therefore, the purpose of this study proposed a clustering and ranking-based approach. The cluster algorithm used K-Means. Meanwhile, the WP-Rank method was used for ranking based. The experimental results showed that the running time was faster with an average execution time of 0.15 seconds by clustering. Furthermore, it was able to improve the quality of the recommendations, as indicated by an increase in the value of NDCG at k=22, the average value of NDCG was 0.82, so the recommendations produced were higher quality and more appropriate to the interests of the users.
Downloads
References
S. Wu, F. Sun, W. Zhang, X. Xie, and B. Cui, “Graph Neural Networks in Recommender Systems: A Survey,” ACM Comput. Surv., vol. 55, no. 5, Dec. 2022, doi: 10.1145/3535101.
M. H. Mohamed, M. H. Khafagy, and M. H. Ibrahim, “Recommender Systems Challenges and Solutions Survey,” in 2019 International Conference on Innovative Trends in Computer Engineering (ITCE), Feb. 2019, pp. 149–155. doi: 10.1109/ITCE.2019.8646645.
S. and B. R. Koren Yehuda and Rendle, “Advances in Collaborative Filtering,” in Recommender Systems Handbook, L. and S. B. Ricci Francesco and Rokach, Ed., New York, NY: Springer US, 2022, pp. 91–142. doi: 10.1007/978-1-0716-2197-4_3.
M. M. Goyani and N. Chaurasiya, “A review of movie recommendation system: Limitations, Survey and Challenges,” Electronic Letters on Computer Vision and Image Analysis, vol. 19, pp. 18–37, 2020.
N. Ifada, T. F. Rahman, and M. K. Sophan, “Comparing collaborative filtering and hybrid based approaches for movie recommendation,” in Proceeding - 6th Information Technology International Seminar, ITIS 2020, Institute of Electrical and Electronics Engineers Inc., Oct. 2020, pp. 219–223. doi: 10.1109/ITIS50118.2020.9321014.
K. Nanthini M. and Pradeep Mohan Kumar, “Cold start and Data Sparsity Problems in Recommender System: A Concise Review,” in International Conference on Innovative Computing and Communications, A. and B. S. and H. A. E. and A. S. and J. A. Gupta Deepak and Khanna, Ed., Singapore: Springer Nature Singapore, 2023, pp. 107–118.
J. Das, T. H. Academy, M. Banerjee, T. H. Academy, and S. Majumder, “Scalable Recommendations using Clustering based Collaborative Filtering,” in International Conference on Information Technology (ICIT), 2019, pp. 1–6. doi: 10.1109/ICIT48102.2019.00056.
L. Wang, X. Zhang, T. Wang, S. Wan, G. Srivastava, and S. Member, “Diversified and Scalable Service Recommendation With Accuracy Guarantee,” IEEE Trans Comput Soc Syst, pp. 1–12, 2020, doi: 10.1109/TCSS.2020.3007812.
Z. Zhao, Y. Sheng, M. Zhu, and A. J. Wang, “A Memory-Efficient Approach to the Scalability of Recommender System With Hit Improvement,” IEEE Access, vol. 6, pp. 67070–67081, 2018, doi: 10.1109/ACCESS.2018.2878808.
N. Ifada, “Employing Sparsity Removal Approach and Fuzzy C-Means Clustering Technique on a Movie Recommendation System,” in International Confrence on Computer Engineering, Network and Intelligent Multimedia (CENIM), 2018, pp. 329–334. doi: 10.1109/CENIM.2018.8711270.
D. Andra and A. Baizal, “E-commerce Recommender System Using PCA and K-Means Clustering,” JURNAL RESTI (Rekayasa Sistem dan Teknologi Informasi, vol. 6, no. 158, pp. 57–63, 2022, doi: https://doi.org/10.29207/resti.v6i1.3782.
M. I. Ardimansyah, A. F. Huda, and Z. K. A. Baizal, “Preprocessing Matrix Factorization for Solving Data Sparsity on Memory-Based Collaborative Filtering,” in 3rd International Conference on Science in Information Technology (ICSITech) Preprocessing, 2017, pp. 521–525.
S. Lestari, T. B. Adji, and A. E. Permanasari, “WP-Rank : Rank Aggregation based Collaborative Filtering Method in Recommender System,” Internationa Journal of Engineering & Technology, vol. 7, pp. 193–197, 2018.
S. Lestari, R. Kurniawan, and D. Linda, “Porat Rank to Improve Performance Recommendation System,” in Proceedings of the 1st International Conference on Electronics, Biomedical Engineering, and Health Informatics, Lecture Notes in Electrical Engineering 746, 2021, pp. 1–14.
M. Mughnyanti, S. Efendi, and M. Zarlis, “Analysis of determining centroid clustering x-means algorithm with davies-bouldin index evaluation,” in IOP Conference Series: Materials Science and Engineering, Institute of Physics Publishing, Jan. 2020. doi: 10.1088/1757-899X/725/1/012128.
Institute of Electrical and Electronics Engineers, 2nd International Conference on Electrical, Computer and Communication Engineering (ECCE) 07-09 February 2019, Cox’s Bazar, Bangladesh : conference digest.
M. Jumarlis and Mirfan, “Detecting Diseases on Clove Leaves Using GLCM and Clustering K-Means,” JURNAL RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 6, no. 4, pp. 624–631, 2022, doi: https://doi.org/10.29207/resti.v6i4.4033.
A. K. Singh, S. Mittal, P. Malhotra, and Y. V. Srivastava, “Clustering Evaluation by Davies-Bouldin Index(DBI) in Cereal data using K-Means,” in Proceedings of the 4th International Conference on Computing Methodologies and Communication, ICCMC 2020, Institute of Electrical and Electronics Engineers Inc., Mar. 2020, pp. 306–310. doi: 10.1109/ICCMC48092.2020.ICCMC-00057.
A. K. Singh, S. Mittal, Y. V. Srivastava, and P. Malhotra, “Clustering Evaluation by Davies-Bouldin Index ( DBI ) in Cereal data using K-Means,” in International Conference on Computing Methodologies and Communication (ICCMC), 2020, pp. 306–310. doi: 10.1109/ICCMC48092.2020.ICCMC-00057.
H. Santoso and H. Magdalena, “Improved K-Means Algorithm on Home Industry Data Clustering in the Province of Bangka Belitung,” in International Conference on Smart Technology and Applications (ICoSTA), 2020, pp. 1–6. doi: 10.1109/ICoSTA48221.2020.1570598913.
Z.-H. Qiu, Q. Hu, Y. Zhong, L. Zhang, and T. Yang, “Large-scale Stochastic Optimization of NDCG Surrogates for Deep Learning with Provable Convergence,” Feb. 2022, [Online]. Available: http://arxiv.org/abs/2202.12183
L. Gienapp, M. Fröbe, M. Hagen, and M. Potthast, “The Impact of Negative Relevance Judgments on NDCG,” in Proceedings of the 29th ACM International Conference on Information & Knowledge Management, in CIKM ’20. New York, NY, USA: Association for Computing Machinery, 2020, pp. 2037–2040. doi: 10.1145/3340531.3412123.
Y. Wang, Y. Xiao, and J. Qiu, “Bi-Rank: A New Bi-Directional Ranking Method for Goods Selection,” in Proceedings - 2020 Chinese Automation Congress, CAC 2020, Institute of Electrical and Electronics Engineers Inc., Nov. 2020, pp. 7566–7569. doi: 10.1109/CAC51589.2020.9327762.
N. Ifada, T. F. Rahman, and M. K. Sophan, “Comparing Collaborative Filtering and Hybrid based Approaches for Movie Recommendation,” in Information Technology International Seminar (ITIS), 2020, pp. 219–223.
L. Niu, Y. A. N. Peng, and Y. Liu, “Deep Recommendation Model Combining Long - and Short-Term Interest Preferences,” IEEE Access, vol. 9, pp. 166455–166464, 2021, doi: 10.1109/ACCESS.2021.3135983.
J. Chen, H. Wang, and Z. Yan, “Evolutionary heterogeneous clustering for rating prediction based on user collaborative fi ltering ☆,” Swarm Evol Comput, vol. 38, no. April 2017, pp. 35–41, 2018, doi: 10.1016/j.swevo.2017.05.008.
M. M. Shabtari, V. Kumar Shukla, H. Singh, and I. Nanda, “Analyzing PIMA Indian Diabetes Dataset through Data Mining Tool ‘RapidMiner,’” in 2021 International Conference on Advance Computing and Innovative Technologies in Engineering, ICACITE 2021, Institute of Electrical and Electronics Engineers Inc., Mar. 2021, pp. 560–574. doi: 10.1109/ICACITE51222.2021.9404741.
S. Lestari, Yulmaini, Aswin, Sylvia, Y. A. Pratama, and Sulyono, “Implementation of the C4.5 algorithm for micro, small, and medium enterprises classification,” International Journal of Electrical and Computer Engineering, vol. 12, no. 6, pp. 6707–6715, Dec. 2022, doi: 10.11591/ijece.v12i6.pp6707-6715.
M. R. Fahlevi, D. R. D. Putri, F. A. Putri, M. Rahman, L. Sipahutar, and M. Muhatri, “Determination of Rice Quality Using the K-Means Clustering Method,” in 2020 2nd International Conference on Cybernetics and Intelligent System, ICORIS 2020, Institute of Electrical and Electronics Engineers Inc., Oct. 2020. doi: 10.1109/ICORIS50180.2020.9320839.
Copyright (c) 2023 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;