Bidirectional Long Short-Term Memory and Word Embedding Feature for Improvement Classification of Cancer Clinical Trial Document

  • Jasmir Jasmir Universitas Dinamika Bangsa
  • Willy Riyadi Universitas Dinamika Bangsa
  • Silvia Rianti Agustini Universitas Dinamika Bangsa
  • Yulia Arvita Universitas Dinamika Bangsa
  • Despita Meisak Universitas Dinamika Bangsa
  • Lies Aryani Universitas Dinamika Bangsa
Keywords: Deep Learning, BiLSTM, Text classification, Word Embedding, Clinical Trials


In recent years, the application of deep learning methods has become increasingly popular, especially for big data, because big data has a very large data size and needs to be predicted accurately. One of the big data is the document text data of cancer clinical trials. Clinical trials are studies of human participation in helping people's safety and health. The aim of this paper is to classify cancer clinical texts from a public data set. The proposed algorithms are Bidirectional Long Short Term Memory (BiLSTM) and Word Embedding Features (WE). This study has contributed to a new classification model for documenting clinical trials and increasing the classification performance evaluation. In this study, two experiments work are conducted, namely experimental work BiLSTM without WE, and experimental work BiLSTM using WE. The experimental results for BiLSTM without WE were accuracy = 86.2; precision = 85.5; recall = 87.3; and F-1 score = 86.4.  meanwhile the experiment results for BiLSTM using WE stated that the evaluation score showed outstanding performance in text classification, especially in clinical trial texts with accuracy = 92,3; precision = 92.2; recall = 92.9; and F-1 score = 92.5.


Download data is not yet available.


W. J. Gradishar et al., “Clinical practice guidelines in oncology,” JNCCN J. Natl. Compr. Cancer Netw., vol. 16, no. 3, pp. 310–320, 2018.

S. Jin, R. Pazdur, and R. Sridhara, “Re-evaluating eligibility criteria for oncology clinical trials: Analysis of investigational new drug applications in 2015,” J. Clin. Oncol., vol. 35, no. 33, pp. 3745–3752, 2017.

J. Jasmir, S. Nurmaini, and B. Tutuko, “Fine-grained algorithm for improving knn computational performance on clinical trials text classification,” Big Data Cogn. Comput., vol. 5, no. 4, 2021.

J. Zhang, F. Liu, W. Xu, and H. Yu, “Feature fusion text classification model combining CNN and BiGRU with multi-attention mechanism,” Futur. Internet, vol. 11, no. 11, 2019.

R. Pavitra and P. C. D. Kalaivaani, “Weakly supervised sentiment analysis using joint sentiment topic detection with bigrams,” 2nd Int. Conf. Electron. Commun. Syst. ICECS 2015, pp. 889–893, 2015.

S. Liao, J. Wang, R. Yu, K. Sato, and Z. Cheng, “ScienceDirect ScienceDirect CNN for situations understanding based on sentiment analysis of twitter data,” Procedia Comput. Sci., vol. 111, no. 2015, pp. 376–381, 2017.

S. Xie, G. Wang, S. Lin, and P. S. Yu, “Review Spam Detection via Temporal Pattern Discovery,” pp. 823–831, 2012.

A. H. Wang, “DON ’ T FOLLOW ME Spam Detection in Twitter,” 2010.

J. Jasmir, S. Nurmaini, R. F. Malik, and D. Zaenal, “Text Classification of Cancer Clinical Trials Documents Using Deep Neural Network and Fine Grained Document Clustering,” vol. 172, no. Siconian 2019, 2020.

Jasmir et al., “Breast Cancer Classification Using Deep Learning,” Proc. 2018 Int. Conf. Electr. Eng. Comput. Sci. ICECOS 2018, vol. 17, pp. 237–242, 2019.

J. Jasmir, S. Nurmaini, R. F. Malik, and B. Tutuko, “Bigram feature extraction and conditional random fields model to improve text classification clinical trial document,” Telkomnika (Telecommunication Comput. Electron. Control., vol. 19, no. 3, pp. 886–892, 2021.

Y. Li, X. Wang, and P. Xu, “Chinese text classification model based on deep learning,” Futur. Internet, vol. 10, no. 11, 2018.

K. Cho et al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” EMNLP 2014 - 2014 Conf. Empir. Methods Nat. Lang. Process. Proc. Conf., pp. 1724–1734, 2014.

W. C. Annisa Darmawahyuni, Siti Nurmaini, Sukemi and M. N. R. and F. Vicko Bhayyu, “Deep Learning with a Recurrent Network Structure in the Sequence Modeling of Imbalanced Data for,” pp. 1–12.

W. K. Sari, D. P. Rini, R. F. Malik, and I. S. B. Azhar, “Sequential Models for Text Classification Using Recurrent Neural Network,” vol. 172, no. Siconian 2019, pp. 333–340, 2020.

A. Gulli and S. Pal, “Long Short Term Memory - LSTM,” Deep Learn. with Keras, pp. 187–195, 2017.

A. Darmawahyuni, S. Nurmaini, and Sukemi, “Deep Learning with Long Short-Term Memory for Enhancement Myocardial Infarction Classification,” Proc. 2019 6th Int. Conf. Instrumentation, Control. Autom. ICA 2019, no. August 2019, pp. 19–23, 2019.

C. Zuheros, S. Tabik, A. Valdivia, E. Martínez-cámara, and F. Herrera, “Deep recurrent neural network for geographical entities disambiguation on social media data,” Knowledge-Based Syst., 2019.

V. Menger, F. Scheepers, and M. Spruit, “Comparing deep learning and classical machine learning approaches for predicting inpatient violence incidents from clinical text,” Appl. Sci., vol. 8, no. 6, 2018.

Z. Liu, B. Tang, X. Wang, and Q. Chen, “De-identification of clinical notes via recurrent neural network and conditional random field,” J. Biomed. Inform., vol. 75, pp. S34–S42, 2017.

B. Jang, M. Kim, G. Harerimana, S. U. Kang, and J. W. Kim, “Bi-LSTM model to increase accuracy in text classification: Combining word2vec CNN and attention mechanism,” Appl. Sci., vol. 10, no. 17, 2020.

M. Zulqarnain, R. Ghazali, M. G. Ghouse, and M. F. Mushtaq, “Efficient processing of GRU based on word embedding for text classification,” Int. J. Informatics Vis., vol. 3, no. 4, pp. 377–383, 2019.

P. Wang, B. Xu, J. Xu, G. Tian, C. L. Liu, and H. Hao, “Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification,” Neurocomputing, vol. 174, pp. 806–814, 2016.

Z. H. Kilimci and S. Akyokus, “Deep learning- and word embedding-based heterogeneous classifier ensembles for text classification,” Complexity, vol. 2018, 2018.

L. Ge and T. S. Moh, “Improving text classification with word embedding,” Proc. - 2017 IEEE Int. Conf. Big Data, Big Data 2017, vol. 2018-Janua, pp. 1796–1805, 2017.

D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, and B. Qin, “Learning Sentiment-Specific Word Embedding,” pp. 1555–1565, 2014.

J. Jasmir et al., “Improving Eligibility Classification on Clinical Trials Document using Bidirectional Long Short Term Memory Recurrent Neural Network,” vol. 12, no. 14, pp. 2784–2792, 2021.

G. Xu, “Sentiment Analysis of Comment Texts Based on BiLSTM,” vol. 7, 2019.

T. Demeester, T. Rocktäschel, and S. Riedel, “Distributed Representations ofWords and Phrases and their Compositionality,” EMNLP 2016 - Conf. Empir. Methods Nat. Lang. Process. Proc., pp. 1389–1399, 2016.

Z. Yin and Y. Shen, “On the dimensionality of word embedding,” Adv. Neural Inf. Process. Syst., vol. 2018-Decem, no. NeurIPS, pp. 887–898, 2018.

K. S. Tai, R. Socher, and C. D. Manning, “Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks.”

A. Darmawahyuni, S. Nurmaini, and F. Firdaus, “Coronary Heart Disease Interpretation Based on Deep Neural Network,” Comput. Eng. Appl. J., vol. 8, no. 1, pp. 1–12, 2019.

How to Cite
Jasmir, J., Riyadi, W., Agustini, S. R., Arvita, Y., Meisak, D., & Aryani, L. (2022). Bidirectional Long Short-Term Memory and Word Embedding Feature for Improvement Classification of Cancer Clinical Trial Document . Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(4), 505 - 510.
Artikel Teknologi Informasi