Abstractive and Extractive Approaches for Summarizing Multi-document Travel Reviews
Abstract
Travel reviews offer insights into users' experiences at places they have visited, including hotels, restaurants, and tourist attractions. Reviews are a type of multidocument, where one place has several reviews from different users. Automatic summarization can help users get the main information in multi-document. Automatic summarization consists of abstractive and extractive approaches. The abstractive approach has the advantage of producing coherent and concise sentences, while the extractive approach has the advantage of producing an informative summary. However, there are weaknesses in the abstractive approach, which results in inaccurate and less information. On the other hand, the extractive approach produces longer sentences compared to the abstractive approach. Based on the characteristics of both approaches, we combine abstractive and extractive methods to produce a more concise and informative summary than can be achieved using either approach alone. To assess the effectiveness of abstractive and extractive, we use ROUGE based on lexical overlaps and BERTScore based on contextual embeddings which it be compared with a partial approach (abstractive only or extractive only). The experimental results demonstrate that the combination of abstractive and extractive approaches, namely BERT-EXT, leads to improved performance. The ROUGE-1 (unigram), ROUGE-2 (bigram), ROUGE-L (longest subsequence), and BERTScore values are 29.48%, 5.76%, 33.59%, and 54.38%, respectively. Combining abstractive and extractive approach yields higher performance than the partial approach.
Downloads
References
C.-F. Tsai, K. Chen, Y.-H. Hu, and W.-K. Chen, “Improving text summarization of online hotel reviews with review helpfulness and sentiment,” Tour. Manag., vol. 80, p. 104122, Oct. 2020, doi: 10.1016/j.tourman.2020.104122.
A. Reyes-Menendez, J. R. Saura, and J. G. Martinez-Navalon, “The Impact of e-WOM on Hotels Management Reputation: Exploring TripAdvisor Review Credibility With the ELM Model,” IEEE Access, vol. 7, pp. 68868–68877, 2019, doi: 10.1109/ACCESS.2019.2919030.
C. Ma, W. E. Zhang, M. Guo, H. Wang, and Q. Z. Sheng, “Multi-document Summarization via Deep Learning Techniques: A Survey,” ACM Comput. Surv., Apr. 2022, doi: 10.1145/3529754.
D. Suleiman and A. Awajan, “Deep Learning Based Abstractive Text Summarization: Approaches, Datasets, Evaluation Measures, and Challenges,” Math. Probl. Eng., vol. 2020, 2020, doi: 10.1155/2020/9365340.
A. Bajaj et al., “Long Document Summarization in a Low Resource Setting using Pretrained Language Models,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop, 2021, pp. 71–80. doi: 10.18653/v1/2021.acl-srw.7.
Z. Liang, J. Du, and C. Li, “Abstractive social media text summarization using selective reinforced Seq2Seq attention model,” Neurocomputing, vol. 410, pp. 432–440, Oct. 2020, doi: 10.1016/j.neucom.2020.04.137.
R. Wijayanti, M. L. Khodra, and D. H. Widyantoro, “Indonesian Abstractive Summarization using Pre-Trained Model,” in 3rd 2021 East Indonesia Conference on Computer and Information Technology, EIConCIT 2021, Apr. 2021, pp. 79–84. doi: 10.1109/EIConCIT50028.2021.9431880.
N. Giarelis, C. Mastrokostas, and N. Karacapilidis, “Abstractive vs. Extractive Summarization: An Experimental Review,” Appl. Sci., vol. 13, no. 13, p. 7620, Jun. 2023, doi: 10.3390/app13137620.
W. S. El-Kassas, C. R. Salama, A. A. Rafea, and H. K. Mohamed, “Automatic text summarization: A comprehensive survey,” Expert Syst. Appl., vol. 165, p. 113679, 2021, doi: 10.1016/j.eswa.2020.113679.
D. Patel, S. Shah, and H. Chhinkaniwala, “Fuzzy logic based multi-document summarization with improved sentence scoring and redundancy removal technique,” Expert Syst. Appl., vol. 134, pp. 167–177, Nov. 2019, doi: 10.1016/j.eswa.2019.05.045.
M. R. Ramadhan, S. N. Endah, and A. B. J. Mantau, “Implementation of Textrank Algorithm in Product Review Summarization,” Nov. 2020. doi: 10.1109/ICICoS51170.2020.9299005.
Muhammad Ikram Kaer Sinapoy, Yuliant Sibaroni, and Sri Suryani Prasetyowati, “Comparison of LSTM and IndoBERT Method in Identifying Hoax on Twitter,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 7, no. 3, pp. 657–662, Jun. 2023, doi: 10.29207/resti.v7i3.4830.
H. A. Wibowo et al., “Semi-Supervised Low-Resource Style Transfer of Indonesian Informal to Formal Language with Iterative Forward-Translation,” in 2020 International Conference on Asian Language Processing (IALP), Dec. 2020, pp. 310–315. doi: 10.1109/IALP51396.2020.9310459.
S. Lamsiyah, A. El Mahdaouy, S. E. A. Ouatik, and B. Espinasse, “Unsupervised extractive multi-document summarization method based on transfer learning from BERT multi-task fine-tuning,” J. Inf. Sci., vol. 49, no. 1, pp. 164–182, Feb. 2023, doi: 10.1177/0165551521990616.
S. Lamsiyah, A. El Mahdaouy, B. Espinasse, and S. El Alaoui Ouatik, “An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings,” Expert Syst. Appl., vol. 167, no. September 2020, p. 114152, 2021, doi 10.1016/j.eswa.2020.114152.
D. Gunawan, A. Pasaribu, R. F. Rahmat, and R. Budiarto, “Automatic Text Summarization for Indonesian Language Using TextTeaser,” IOP Conf. Ser. Mater. Sci. Eng., vol. 190, p. 012048, Apr. 2017, doi: 10.1088/1757-899X/190/1/012048.
A. Zaqiyah, D. Purwitasari, and C. Fatichah, “Text Generation with Content and Structure-Based Preprocessing in Imbalanced Data of Product Review,” Int. J. Intell. Eng. Syst., vol. 14, no. 1, pp. 516–527, Feb. 2021, doi: 10.22266/ijies2021.0228.48.
S. Cahyaningtyas, D. Hatta Fudholi, and A. Fathan Hidayatullah, “Deep Learning for Aspect-Based Sentiment Analysis on Indonesian Hotels Reviews,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, Aug. 2021, doi: 10.22219/kinetik.v6i3.1300.
R. M. Alguliyev, R. M. Aliguliyev, N. R. Isazade, A. Abdi, and N. Idris, “COSUM: Text summarization based on clustering and optimization,” Expert Syst., vol. 36, no. 1, p. e12340, Feb. 2019, doi: 10.1111/exsy.12340.
W. Jiang, J. Chen, X. Ding, J. Wu, J. He, and G. Wang, “Review Summary Generation in Online Systems: Frameworks for Supervised and Unsupervised Scenarios,” ACM Trans. Web, vol. 15, no. 3, pp. 1–33, May 2021, doi: 10.1145/3448015.
J. Lovinger, I. Valova, and C. Clough, “Gist: general integrated summarization of text and reviews,” Soft Comput., vol. 23, no. 5, pp. 1589–1601, Mar. 2019, doi: 10.1007/s00500-017-2882-2.
N. Hafidz and D. Yanti Liliana, “Klasifikasi Sentimen pada Twitter Terhadap WHO Terkait Covid-19 Menggunakan SVM, N-Gram, PSO,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 2, pp. 213–219, Apr. 2021, doi: 10.29207/resti.v5i2.2960.
R. Campos, V. Mangaravite, A. Pasquali, A. Jorge, C. Nunes, and A. Jatowt, “YAKE! Keyword extraction from single documents using multiple local features,” Inf. Sci. (Ny)., vol. 509, pp. 257–289, Jan. 2020, doi: 10.1016/j.ins.2019.09.013.
Kata-Ai, “Kata-ai/indosum: A benchmark dataset for Indonesian text summarization.” Accessed: Aug. 7, 2023, [Online]. Available: https://github.com/kata-ai/indosum
IndigoResearch. “IndigoResearch/text teaser: Official version of TextTeaser.” Accessed: Aug. 7, 2023, [Online]. Available: https://github.com/IndigoResearch/textteaser
dmmiller612. “DMMILLER612/bert-extractive-summarizer: Easy to use extractive text summarization with Bert.” Accessed: Aug. 7, 2023, [Online]. Available: https://github.com/dmmiller612/bert-extractive-summarizer
Copyright (c) 2023 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;