Summarization and Classification of Sports News using  Textrank and KNN

Falahah

doi:10.29207/joseit.v3i1.5706

Falahah Telkom University

DOI: https://doi.org/10.29207/joseit.v3i1.5706

Keywords: sport news, summarization, classification, textrank, KNN

Abstract

The news summary process is critical in the news analysis process. However, there are frequently barriers to the summary process, such as the large number of news articles and the requirement for news classification. The goal of this study is to develop a news summary and categorization model that will be extremely valuable in the news analysis process. Textrank is the suggested summarizing approach, and KNN will be utilized for news classification. The resulting model can be used to automatically summarize and group news, making content analysis easier. Sports news will be used as the study object from July to August 2023, and the supervised category will be used to identify whether the news comprises sports news in three branches, soccer, badminton / tennis, or basketball. Classification is carried out using the KNN algorithm by training the model using 500 categorized news data. Modeling using k = 3 and k = 5 shows that the precision is around 0.9866 and 0.9666 respectively. The model's implementation on unknown text demonstrates that the model can properly predict text categories as long as the news content falls into the three specified categories, but fails for news content that does not fall into these categories.

Downloads

Download data is not yet available.

References

M. Frackiewich, “The Impact of AI Text Summarization on Journalism and Media,” T2S Space, 2023. https://ts2.space/en/the-impact-of-ai-text-summarization-on-journalism-and-media-2/#gsc.tab=0 (accessed Nov. 20, 2023).

D. Wilding, P. Fray, S. Molitorisz, and E. McKewon, “The Impact of Digital Platforms on News and Journalistic Content,” 2018. [Online]. Available: https://www.accc.gov.au/system/files/ACCC+commissioned+report+-+The+impact+of+digital+platforms+on+news+and+journalistic+content,+Centre+for+Media+Transition+(2).pdf.

C. Zhu, “Applications and future of machine reading comprehension,” in Machine Reading Comprehension, Elsevier, 2021, pp. 185–207.

D. Miller, “Leveraging BERT for Extractive Text Summarization on Lectures,” Jun. 2019, [Online]. Available: http://arxiv.org/abs/1906.04165.

N. Zhou, W. Shi, R. Liang, and N. Zhong, “TextRank Keyword Extraction Algorithm Using Word Vector Clustering Based on Rough Data-Deduction,” Comput. Intell. Neurosci., vol. 2022, pp. 1–19, Jan. 2022, doi: 10.1155/2022/5649994.

S. Kemahduta, “Automatic Text Summarization dengan kategorisasi pada berita online mengenai tokoh masyarakat indonesia dengan metode Fuzzy Logic,” Universitas Sebelas Maret, 2019.

H. Gupta and M. Patel, “Method Of Text Summarization Using Lsa And Sentence Based Topic Modelling With Bert,” in 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), Mar. 2021, pp. 511–517, doi: 10.1109/ICAIS50930.2021.9395976.

K. U. Manjari, S. Rousha, D. Sumanth, and J. Sirisha Devi, “Extractive Text Summarization from Web pages using Selenium and TF-IDF algorithm,” in 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184), Jun. 2020, pp. 648–652, doi: 10.1109/ICOEI48184.2020.9142938.

P. Modaresi, P. Gross, S. Sefidrodi, M. Eckhof, and S. Conrad, “On (Commercial) Benefits of Automatic Text Summarization Systems in the News Domain: A Case of Media Monitoring and Media Response Analysis,” Jan. 2017, [Online]. Available: http://arxiv.org/abs/1701.00728.

K. S. Thakkar, R. V Dharaskar, and M. B. Chandak, “Graph-Based Algorithms for Text Summarization,” in 2010 3rd International Conference on Emerging Trends in Engineering and Technology, Nov. 2010, pp. 516–519, doi: 10.1109/ICETET.2010.104.

A. Abdurrohman, “Evaluasi Algoritma Textrank pada Peringkasan Teks Berbahasa Indonesia,” Universitas Sumatera Utara, 2018.

Y. Marsyah and S. H. Wijaya, “Perbandingan Kinerja Algoritme TextRank dengan Algoritme LexRank pada Peringkasan Dokumen Bahasa Indonesia,” IPB University, 2013.

S. R. K. Harinatha, B. T. Tasara, and N. N. Qomariyah, “Evaluating Extractive Summarization Techniques on News Articles,” in 2021 International Seminar on Intelligent Technology and Its Applications (ISITIA), Jul. 2021, pp. 88–94, doi: 10.1109/ISITIA52817.2021.9502230.

M. Zhang, X. Li, S. Yue, and L. Yang, “An Empirical Study of TextRank for Keyword Extraction,” IEEE Access, vol. 8, pp. 178849–178858, 2020, doi: 10.1109/ACCESS.2020.3027567.

E. Yulianti, N. Pangestu, and M. A. Jiwanggi, “Enhanced TextRank using weighted word embedding for text summarization,” Int. J. Electr. Comput. Eng., vol. 13, no. 5, p. 5472, Oct. 2023, doi: 10.11591/ijece.v13i5.pp5472-5482.

S. Mishra, M. Kuznetsov, G. Srivastava, and M. Sviridenko, “VisualTextRank: Unsupervised Graph-based Content Extraction for Automating Ad Text to Image Search,” Aug. 2021, doi: 10.1145/1122445.1122456.

J. Ahmed and M. Ahmed, “ONLINE NEWS CLASSIFICATION USING MACHINE LEARNING TECHNIQUES,” IIUM Eng. J., vol. 22, no. 2, pp. 210–225, Jul. 2021, doi: 10.31436/iiumej.v22i2.1662.

Nur Ghaniaviyanto Ramadhan, “Indonesian Online News Topics Classification using Word2Vec and K-Nearest Neighbor,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 6, pp. 1083–1089, Dec. 2021, doi: 10.29207/resti.v5i6.3547.

K. Munawaroh and A. Alamsyah, “Performance Comparison of SVM, Naïve Bayes, and KNN Algorithms for Analysis of Public Opinion Sentiment Against COVID-19 Vaccination on Twitter,” J. Adv. Inf. Syst. Technol., vol. 4, no. 2, pp. 113–125, Mar. 2023, doi: 10.15294/jaist.v4i2.59493.

Z. Wang and Z. Liu, “Graph-based KNN text classification,” 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, 2010, doi: 10.1109/fskd.2010.5569866.