Summarization and Classification of Sports News using Textrank and KNN
Abstract
The news summary process is critical in the news analysis process. However, there are frequently barriers to the summary process, such as the large number of news articles and the requirement for news classification. The goal of this study is to develop a news summary and categorization model that will be extremely valuable in the news analysis process. Textrank is the suggested summarizing approach, and KNN will be utilized for news classification. The resulting model can be used to automatically summarize and group news, making content analysis easier. Sports news will be used as the study object from July to August 2023, and the supervised category will be used to identify whether the news comprises sports news in three branches, soccer, badminton / tennis, or basketball. Classification is carried out using the KNN algorithm by training the model using 500 categorized news data. Modeling using k = 3 and k = 5 shows that the precision is around 0.9866 and 0.9666 respectively. The model's implementation on unknown text demonstrates that the model can properly predict text categories as long as the news content falls into the three specified categories, but fails for news content that does not fall into these categories.
Downloads
References
M. Frackiewich, “The Impact of AI Text Summarization on Journalism and Media,” T2S Space, 2023. https://ts2.space/en/the-impact-of-ai-text-summarization-on-journalism-and-media-2/#gsc.tab=0 (accessed Nov. 20, 2023).
D. Wilding, P. Fray, S. Molitorisz, and E. McKewon, “The Impact of Digital Platforms on News and Journalistic Content,” 2018. [Online]. Available: https://www.accc.gov.au/system/files/ACCC+commissioned+report+-+The+impact+of+digital+platforms+on+news+and+journalistic+content,+Centre+for+Media+Transition+(2).pdf.
C. Zhu, “Applications and future of machine reading comprehension,” in Machine Reading Comprehension, Elsevier, 2021, pp. 185–207.
D. Miller, “Leveraging BERT for Extractive Text Summarization on Lectures,” Jun. 2019, [Online]. Available: http://arxiv.org/abs/1906.04165.
N. Zhou, W. Shi, R. Liang, and N. Zhong, “TextRank Keyword Extraction Algorithm Using Word Vector Clustering Based on Rough Data-Deduction,” Comput. Intell. Neurosci., vol. 2022, pp. 1–19, Jan. 2022, doi: 10.1155/2022/5649994.
S. Kemahduta, “Automatic Text Summarization dengan kategorisasi pada berita online mengenai tokoh masyarakat indonesia dengan metode Fuzzy Logic,” Universitas Sebelas Maret, 2019.
H. Gupta and M. Patel, “Method Of Text Summarization Using Lsa And Sentence Based Topic Modelling With Bert,” in 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), Mar. 2021, pp. 511–517, doi: 10.1109/ICAIS50930.2021.9395976.
K. U. Manjari, S. Rousha, D. Sumanth, and J. Sirisha Devi, “Extractive Text Summarization from Web pages using Selenium and TF-IDF algorithm,” in 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184), Jun. 2020, pp. 648–652, doi: 10.1109/ICOEI48184.2020.9142938.
P. Modaresi, P. Gross, S. Sefidrodi, M. Eckhof, and S. Conrad, “On (Commercial) Benefits of Automatic Text Summarization Systems in the News Domain: A Case of Media Monitoring and Media Response Analysis,” Jan. 2017, [Online]. Available: http://arxiv.org/abs/1701.00728.
K. S. Thakkar, R. V Dharaskar, and M. B. Chandak, “Graph-Based Algorithms for Text Summarization,” in 2010 3rd International Conference on Emerging Trends in Engineering and Technology, Nov. 2010, pp. 516–519, doi: 10.1109/ICETET.2010.104.
A. Abdurrohman, “Evaluasi Algoritma Textrank pada Peringkasan Teks Berbahasa Indonesia,” Universitas Sumatera Utara, 2018.
Y. Marsyah and S. H. Wijaya, “Perbandingan Kinerja Algoritme TextRank dengan Algoritme LexRank pada Peringkasan Dokumen Bahasa Indonesia,” IPB University, 2013.
S. R. K. Harinatha, B. T. Tasara, and N. N. Qomariyah, “Evaluating Extractive Summarization Techniques on News Articles,” in 2021 International Seminar on Intelligent Technology and Its Applications (ISITIA), Jul. 2021, pp. 88–94, doi: 10.1109/ISITIA52817.2021.9502230.
M. Zhang, X. Li, S. Yue, and L. Yang, “An Empirical Study of TextRank for Keyword Extraction,” IEEE Access, vol. 8, pp. 178849–178858, 2020, doi: 10.1109/ACCESS.2020.3027567.
E. Yulianti, N. Pangestu, and M. A. Jiwanggi, “Enhanced TextRank using weighted word embedding for text summarization,” Int. J. Electr. Comput. Eng., vol. 13, no. 5, p. 5472, Oct. 2023, doi: 10.11591/ijece.v13i5.pp5472-5482.
S. Mishra, M. Kuznetsov, G. Srivastava, and M. Sviridenko, “VisualTextRank: Unsupervised Graph-based Content Extraction for Automating Ad Text to Image Search,” Aug. 2021, doi: 10.1145/1122445.1122456.
J. Ahmed and M. Ahmed, “ONLINE NEWS CLASSIFICATION USING MACHINE LEARNING TECHNIQUES,” IIUM Eng. J., vol. 22, no. 2, pp. 210–225, Jul. 2021, doi: 10.31436/iiumej.v22i2.1662.
Nur Ghaniaviyanto Ramadhan, “Indonesian Online News Topics Classification using Word2Vec and K-Nearest Neighbor,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 6, pp. 1083–1089, Dec. 2021, doi: 10.29207/resti.v5i6.3547.
K. Munawaroh and A. Alamsyah, “Performance Comparison of SVM, Naïve Bayes, and KNN Algorithms for Analysis of Public Opinion Sentiment Against COVID-19 Vaccination on Twitter,” J. Adv. Inf. Syst. Technol., vol. 4, no. 2, pp. 113–125, Mar. 2023, doi: 10.15294/jaist.v4i2.59493.
Z. Wang and Z. Liu, “Graph-based KNN text classification,” 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, 2010, doi: 10.1109/fskd.2010.5569866.
Copyright (c) 2024 Journal of Systems Engineering and Information Technology (JOSEIT)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).