Malaria Blood Cell Image Classification using Transfer Learning with Fine-Tune ResNet50 and Data Augmentation

Based on the WHO Report related to malaria, it is estimated that there will be 241 million malaria cases and 627,000 deaths from this disease globally in 2020 with the number of deaths increasing yearly. Preventing malaria disease conditions is through early detection. A more quick and precise malaria diagnosis method was required to simplify and reduce the detection process. Medical image classification could be carried out rapidly and precisely using machine learning or deep learning techniques. This research aims to diagnose malaria by classifying images of malaria blood cells using Deep Learning with a Transfer Learning approach. By utilizing various fine-tuning procedures and implementing data augmentation proposed method develops the method from previous studies. Two types of models Frozen ResNet50 and Fine-Tune ResNet50 are being tested. The dataset utilized will be augmented to improve model performance. This study makes use of the "NIH Malaria Cell Images Dataset" a dataset that contains a total of 27,660 image data. It is divided into two classes: parasitized and uninfected. The results are improved from previous research using the fine-tuned VGG16 model with an accuracy of 96% compared to this study using the fine-tuned ResNet50 model which achieved an accuracy score of 98%.


Introduction
Plasmodium parasites infect female Anopheles mosquitoes and then cause an acute febrile illness that can be transmitted to humans and is known as malaria. P. falciparum and P. vivax are the two parasite species that cause malaria in humans out of five parasite species [1]. According to predictions from the WHO report on malaria, there will be 241 million cases of the disease worldwide in 2020 and 627,000 fatalities from it, with the number of deaths continuing to rise yearly [2]. Meanwhile, in Indonesia, according to the Indonesian Ministry of Health, the province with the most malaria cases in Indonesia is Papua, where there were 86,022 cases reported in 2021 [3].
The best method to prevent malaria disease conditions is early detection and treatment. The most reliable and widely used method for the diagnosis of malaria continues to be the thick or thin blood smear examination [4]. A blood smear examination is used to diagnose malaria disease and produces reliable results. Parasites in blood samples will be more easily recognized and detected during the Giemsa staining procedure. Red blood cells (RBCs) and Plasmodium parasites are stained with Giemsa. A staining object is required to detect Plasmodium parasites [5].
As the development of computer vision progresses by implementing machine learning or deep learning methods, they can be used to classify medical images quickly and accurately [6]. Canonically, image classification with machine learning approaches such as Naive Bayes, Decision Tree, Linear Discriminant, Support Vector Machine (SVM), and K-Nearest Neighbor (K-NN) has been widely used in general for malaria classification [7][8] [9][10] [11]. However, the challenge of using machine learning is the suitability of selecting the type of feature extraction. Color, shape, and texture, along with all their derived aspects, are among the selected features. Therefore, the success of recognition will be determined by the appropriate choice of features [12] [13].
Currently, deep learning is an alternative method to perform automatic feature extraction at the initial layer of the network and capture various primitive features well [14]. Deep learning has been used in research on image classification, specifically blood smears containing malaria-infected cells. Previous research used a Convolutional Neural Network (CNN) VGG16  [19]. The use of deep learning for classification also struggles to give the best results. This depends on the dataset used. The learning process in deep learning algorithms is significantly affected by the amount of data in each class. In addition, the choice of hidden layers, convolutional layer models, and other CNN parameters have a huge impact on the accuracy of results [20].
A. S. B. Reddy and D. S. Juliet published the first article utilizing Transfer Learning ResNet50 for diagnosing malaria [21]. Transfer Learning-based classification has demonstrated relatively high performance for medical image classification over the past few years. This research contributes to developing previous research [15] [21]. Utilizing an identical dataset with the addition of data augmentation techniques to help optimize the proposed method, which is a shortcoming in this study. A ResNet50 model to classify malaria blood cell images with accuracy and evaluation metrics values in comparison to previous research methods.

Research Methods
The research begins with the collection of Giemsa blood smear sample image datasets, which are then separated into three folders called train data, validation data, and test data. Details of the research stages are shown in Figure 1. Next, change the resolution of each image in the dataset. The last stage in data preprocessing is performing data augmentation. After the data preprocessing stage, the processing of data grouping results will continue. The train data and validation data will be used to train the model, and the test data will be used to validate the test results with the specified model and parameters.
Evaluating the Fine-Tuned Resnet50 model architecture which involves going into the modification flow by implementing various training parameters and model architecture between the two types of models if there is still no appreciable improvement in model performance from Frozen Resnet50.

Dataset
The dataset being used is a dataset originating from kaggle.com [22] entitled "Malaria Cell Images Dataset" which is a dataset sourced from the National Institute of Health (NIH) [23].
Within this dataset, there are two class categories, namely Parasitized and Uninfected, where each PNG format image data has a resolution that varies between 150 x 150 pixels, and the total amount of data is 27,558 malaria blood smear image datasets. Based on Figure 2, the top 5 images are blood smear images that have been infected with malaria, while the bottom 5 images are blood smear images that are not infected with malaria. At the Chittagong Medical College Hospital in Bangladesh, thin blood smear slides stained with Giemsa were taken from 150 P. falciparum-infected patients and 50 healthy patients. A slide was photographed for each minuscule field of view using the smartphone's built-in camera. The Giemsa blood smear slides image dataset has a balanced distribution of data between Parasitized and Uninfected classes. The percentage of data distribution used in this study is training by 70%, validation by 10%, and testing by 20% which is described in Table 1.

Data Augmentation
The data augmentation is performed by manipulating each existing image into a different form instead of adding new image variations, so the number of datasets before and after the augmentation process does not change [24] [25]. The Image Data Generator is used during the augmentation process to change the shape of each image in the dataset by activating several parameters as shown in Table 2. Ensuring that the system can still distinguish between different types of images as data for Parasitized and Uninfected classes when the classification process is performed, each image in the dataset will be partially randomly enhanced according to the activated parameters.

Fine-tuning
Fine-tuning refers to a technique that involves unfreezing the top few layers of the frozen base model and jointly training the newly added classifier layer and the final layer of the base model. This provides an opportunity to "fine-tune" the high-level feature representation of the base model to make it more applicable to a particular task [26].

Model Architecture
The Model Architecture uses a CNN model by applying the transfer learning method ResNet50, which has been fine-tuned by researchers with the addition of several layers in the top layer of ResNet50 which does not use the default top layer [27]. The proposed architecture of Fine-Tune ResNet50 can be found in Figure 3.

Mish Activation Function
ReLU is a well-known activation function that is commonly used. In this study, the Fine-tune ResNet50 model is proposed to apply a more powerful activation function, namely Mish. The following is the mathematical definition of the Mish activation function in Equation 1 and ReLU in Equation 2 [28].
While tanh() returns the hyperbolic tangent of a number, softplus() is a smooth approximation to ReLU and max() determines the highest value. The selfselecting gate in the Mish activation function is considered more advantageous compared to other activation mechanisms such as ReLU with a point-topoint function. Any CNN framework can be used to implement Mish, which ensures a non-monotonic output and produces a smooth output for each point. [29] [30].
Where 0 is the first momentum; 0 is the infinity norm; is the gradient. The abs() function determines the absolute value and max() selects the parameter with the highest value. The alpha value for the adamax optimizer hyperparameter or initial step size starts with 2e-4 [31].

Test Scenario
In this study, two classes will be classified, which are Parasitized and Uninfected Giemsa blood smear samples, then the dataset will be divided into three types starting from train data, validation data, and test data will be used for training and testing models. the Aris Muhandisin, Yufis Azhar Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. following are 3 main test scenarios performed with data augmentation as shown in Table 3.

Results and Discussions
The accomplished steps based on the method arrangement given in the research method constitute the findings of this study. The "Malaria Cell Images Dataset" is a dataset that includes two classes: Uninfected and Parasitized. The data is further divided into training, validation, and test data with a ratio of 70%, 10%, and 20% after being extracted from zip format. Each training, validation, and a test set of data were scaled to 224 by 224 pixels. To achieve the highest intensity for each image, the dataset is divided by 255 during the normalization procedure. Also, every image's size has been modified to a single size, and the image data is included in a single layer.
Several data augmentation techniques will be put into practice in the steps that follow randomly rotating the image by 30° due to was already sufficient for this case and rotating the image at 30° does not rotate the image at 30°, but the image is rotated randomly between 0° and 30° at each epoch. this is useful for augmentation. where each epoch, the model will be trained using a different image.
A fundamental weakness of the VGG architecture, which is the vanishing gradient problem, was one of the reasons ResNet50 was implemented. By examining the accuracy validation graph from the previous study [15], there is a less stable upward and downward trend. The vanishing gradient problem is addressed by the ResNet design. Also, VGG is slower than the more modern ResNet architecture, which incorporates the concept of residual learning and is a considerable improvement.
To find out the effect of data augmentation, an attempt we also made to test using the best method, Fine-Tune ResNet50(2) model without data augmentation, which will be explained thoroughly in the next chapter.

Model Performance Comparison
Furthermore, the Resnet50 model that has been made will be run with the loss_binary cross-entropy parameter because it only has two classes, with a batch size of 20, and 100 epochs. After the training process, the Frozen and Fine-Tune Resnet50 models with different parameters produce differences in model accuracy and the results are in Table 4. From the experiments conducted, all models with higher learning rates have better accuracy values, but there are different parameter usages. The use of the Adamax optimizer with the Mish activation function and an increase in the learning rate in the fine-tuned model led to an increase in the accuracy value of the Frozen Resnet50 by 2%.
Based on the results of the classification report, each experiment has different precision and recall values between classes. In Table 5, the precision value of Fine-Tuned Resnet50 with a higher learning rate has an average increase in precision and recall value of 1% compared to the Frozen Resnet50 model with a lower learning rate when correctly predicting the Parasitized class among the Uninfected class. Since the results of some of the metrics to be presented from Fine-Tuned ResNet50(1) are not appreciably different from Fine-Tuned ResNet50(2), it is considered not to be discussed further. Frozen Resnet50 model training process shows the loss and accuracy plot results listed in Figure 4 and Figure 5.    The following figures 6 and 7 show the training result plots of Fine-Tune ResNet50(2) without the data augmentation process. The loss and accuracy plots demonstrate severe overfitting conditions, particularly in the loss plot where the val_loss and train loss values have a gap that is quite visibly present between epochs 30 to 100. The result of val_accuracy is 96.95%, while train_accuracy is higher at 99.75% at the 100th epoch which is probably the highest among all methods that use data augmentation, and val_loss is also higher than train loss.   (2) with the Data Augmentation in best fit condition than the same model or method without Data Augmentation shown before, also the accuracy plot is more stable than the Frozen ResNet50 model. The resulting val_accuracy value is 97.60% with a relatively low val_loss value.  Finally, Figure 10 shows that the ROC curve generated by the Fine-Tuned ResNet50(2) model is better than the Frozen ResNet50. This is evidenced by the fine-tuned ResNet50(2) ROC curve is closer to a sensitivity value of 1 and a higher specificity value.
Aris Muhandisin, Yufis Azhar Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. After carrying out the above 3 experiments, the use of a higher learning rate with activation function Mish in the Fine-Tuned ResNet50 model performed better than the Frozen ResNet50 model. This can be proven by the decrease in val_loss value in the Fine-Tune ResNet50(2) model and val_accuracy, which always increases in both fine-tuned models.

Confusion Matrix
To evaluate the effectiveness of the developed classification method, the confusion matrix is used as the classification evaluation. The confusion table shown in Figure 11 is generated by Frozen ResNet50. From the figure, it is obvious that there are 2628 correctly predicted image data and 128 incorrectly predicted image data in the Parasitized class. In addition, 2669 image data points from the Uninfected class were correctly predicted by the model, while 87 image data points from this class were incorrectly predicted. The confusion matrix results of Fine-Tune ResNet50 (2) are then shown in Figure 12. The confusion matrix below shows that the Parasitized class correctly predicts 2661 image data points, while 95 image data points are wrongly predicted. For the Uninfected class, there are 2718 correctly predicted image data and 38 incorrectly predicted image data.

Performance Comparison of the Best Model with previous research
After performing several sets of test scenarios, the next step is to compare the performance of the best model with the results of the accuracy values obtained in previous studies. Based on Figure 13, the classification report on the Fine-Tuned ResNet50(2) model uses a learning rate of 3e-4 where it obtains an accuracy value of 98% and 99% precision in the Parasitized at class 0 and 97% for the Uninfected at class 1.   [21] by 3% with the different fine-tuning procedure.

Conclusion
After conducting all stages of the research, it can be concluded that in the case of malaria disease image classification by utilizing data augmentation techniques described above to optimize ResNet50 in the Frozen model shows a good accuracy value of 96%. However, by performing a fine-tuning procedure using the Mish activation function with the Adamax optimizer and adjusting the learning rate, the accuracy increased by 2% also the data augmentation is proven to reduce overfitting conditions. For more comprehensive testing, different gradient descent optimization, such as Nadam or SGD, can be utilized to improve model performance, allowing for the possibility of additional study. Although several variables, such as the epochs, frequency validation, and batch size, are configurable. Some suggestions above are aimed to achieve better accuracy results.