Bidirectional Long Short-Term Memory and Word Embedding Feature for Improvement Classification of Cancer Clinical Trial Document
Abstract
In recent years, the application of deep learning methods has become increasingly popular, especially for big data, because big data has a very large data size and needs to be predicted accurately. One of the big data is the document text data of cancer clinical trials. Clinical trials are studies of human participation in helping people's safety and health. The aim of this paper is to classify cancer clinical texts from a public data set. The proposed algorithms are Bidirectional Long Short Term Memory (BiLSTM) and Word Embedding Features (WE). This study has contributed to a new classification model for documenting clinical trials and increasing the classification performance evaluation. In this study, two experiments work are conducted, namely experimental work BiLSTM without WE, and experimental work BiLSTM using WE. The experimental results for BiLSTM without WE were accuracy = 86.2; precision = 85.5; recall = 87.3; and F-1 score = 86.4. meanwhile the experiment results for BiLSTM using WE stated that the evaluation score showed outstanding performance in text classification, especially in clinical trial texts with accuracy = 92,3; precision = 92.2; recall = 92.9; and F-1 score = 92.5.
Downloads
References
W. J. Gradishar et al., “Clinical practice guidelines in oncology,” JNCCN J. Natl. Compr. Cancer Netw., vol. 16, no. 3, pp. 310–320, 2018.
S. Jin, R. Pazdur, and R. Sridhara, “Re-evaluating eligibility criteria for oncology clinical trials: Analysis of investigational new drug applications in 2015,” J. Clin. Oncol., vol. 35, no. 33, pp. 3745–3752, 2017.
J. Jasmir, S. Nurmaini, and B. Tutuko, “Fine-grained algorithm for improving knn computational performance on clinical trials text classification,” Big Data Cogn. Comput., vol. 5, no. 4, 2021.
J. Zhang, F. Liu, W. Xu, and H. Yu, “Feature fusion text classification model combining CNN and BiGRU with multi-attention mechanism,” Futur. Internet, vol. 11, no. 11, 2019.
R. Pavitra and P. C. D. Kalaivaani, “Weakly supervised sentiment analysis using joint sentiment topic detection with bigrams,” 2nd Int. Conf. Electron. Commun. Syst. ICECS 2015, pp. 889–893, 2015.
S. Liao, J. Wang, R. Yu, K. Sato, and Z. Cheng, “ScienceDirect ScienceDirect CNN for situations understanding based on sentiment analysis of twitter data,” Procedia Comput. Sci., vol. 111, no. 2015, pp. 376–381, 2017.
S. Xie, G. Wang, S. Lin, and P. S. Yu, “Review Spam Detection via Temporal Pattern Discovery,” pp. 823–831, 2012.
A. H. Wang, “DON ’ T FOLLOW ME Spam Detection in Twitter,” 2010.
J. Jasmir, S. Nurmaini, R. F. Malik, and D. Zaenal, “Text Classification of Cancer Clinical Trials Documents Using Deep Neural Network and Fine Grained Document Clustering,” vol. 172, no. Siconian 2019, 2020.
Jasmir et al., “Breast Cancer Classification Using Deep Learning,” Proc. 2018 Int. Conf. Electr. Eng. Comput. Sci. ICECOS 2018, vol. 17, pp. 237–242, 2019.
J. Jasmir, S. Nurmaini, R. F. Malik, and B. Tutuko, “Bigram feature extraction and conditional random fields model to improve text classification clinical trial document,” Telkomnika (Telecommunication Comput. Electron. Control., vol. 19, no. 3, pp. 886–892, 2021.
Y. Li, X. Wang, and P. Xu, “Chinese text classification model based on deep learning,” Futur. Internet, vol. 10, no. 11, 2018.
K. Cho et al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” EMNLP 2014 - 2014 Conf. Empir. Methods Nat. Lang. Process. Proc. Conf., pp. 1724–1734, 2014.
W. C. Annisa Darmawahyuni, Siti Nurmaini, Sukemi and M. N. R. and F. Vicko Bhayyu, “Deep Learning with a Recurrent Network Structure in the Sequence Modeling of Imbalanced Data for,” pp. 1–12.
W. K. Sari, D. P. Rini, R. F. Malik, and I. S. B. Azhar, “Sequential Models for Text Classification Using Recurrent Neural Network,” vol. 172, no. Siconian 2019, pp. 333–340, 2020.
A. Gulli and S. Pal, “Long Short Term Memory - LSTM,” Deep Learn. with Keras, pp. 187–195, 2017.
A. Darmawahyuni, S. Nurmaini, and Sukemi, “Deep Learning with Long Short-Term Memory for Enhancement Myocardial Infarction Classification,” Proc. 2019 6th Int. Conf. Instrumentation, Control. Autom. ICA 2019, no. August 2019, pp. 19–23, 2019.
C. Zuheros, S. Tabik, A. Valdivia, E. Martínez-cámara, and F. Herrera, “Deep recurrent neural network for geographical entities disambiguation on social media data,” Knowledge-Based Syst., 2019.
V. Menger, F. Scheepers, and M. Spruit, “Comparing deep learning and classical machine learning approaches for predicting inpatient violence incidents from clinical text,” Appl. Sci., vol. 8, no. 6, 2018.
Z. Liu, B. Tang, X. Wang, and Q. Chen, “De-identification of clinical notes via recurrent neural network and conditional random field,” J. Biomed. Inform., vol. 75, pp. S34–S42, 2017.
B. Jang, M. Kim, G. Harerimana, S. U. Kang, and J. W. Kim, “Bi-LSTM model to increase accuracy in text classification: Combining word2vec CNN and attention mechanism,” Appl. Sci., vol. 10, no. 17, 2020.
M. Zulqarnain, R. Ghazali, M. G. Ghouse, and M. F. Mushtaq, “Efficient processing of GRU based on word embedding for text classification,” Int. J. Informatics Vis., vol. 3, no. 4, pp. 377–383, 2019.
P. Wang, B. Xu, J. Xu, G. Tian, C. L. Liu, and H. Hao, “Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification,” Neurocomputing, vol. 174, pp. 806–814, 2016.
Z. H. Kilimci and S. Akyokus, “Deep learning- and word embedding-based heterogeneous classifier ensembles for text classification,” Complexity, vol. 2018, 2018.
L. Ge and T. S. Moh, “Improving text classification with word embedding,” Proc. - 2017 IEEE Int. Conf. Big Data, Big Data 2017, vol. 2018-Janua, pp. 1796–1805, 2017.
D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, and B. Qin, “Learning Sentiment-Specific Word Embedding,” pp. 1555–1565, 2014.
J. Jasmir et al., “Improving Eligibility Classification on Clinical Trials Document using Bidirectional Long Short Term Memory Recurrent Neural Network,” vol. 12, no. 14, pp. 2784–2792, 2021.
G. Xu, “Sentiment Analysis of Comment Texts Based on BiLSTM,” vol. 7, 2019.
T. Demeester, T. Rocktäschel, and S. Riedel, “Distributed Representations ofWords and Phrases and their Compositionality,” EMNLP 2016 - Conf. Empir. Methods Nat. Lang. Process. Proc., pp. 1389–1399, 2016.
Z. Yin and Y. Shen, “On the dimensionality of word embedding,” Adv. Neural Inf. Process. Syst., vol. 2018-Decem, no. NeurIPS, pp. 887–898, 2018.
K. S. Tai, R. Socher, and C. D. Manning, “Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks.”
A. Darmawahyuni, S. Nurmaini, and F. Firdaus, “Coronary Heart Disease Interpretation Based on Deep Neural Network,” Comput. Eng. Appl. J., vol. 8, no. 1, pp. 1–12, 2019.
Copyright (c) 2022 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;