Covid-19 Detection Using Convolutional Neural Networks (CNN) Classification Algorithm

Corona Virus or also known as COVID-19 is one of the new viruses in 2019. Viruses caused by animal or human disease are called coronaviruses. Coronavirus will direct respiration in humans. Humans who are exposed to the corona virus will experience a respiratory infection. The research that will be made is useful for classifying X-rays of the lungs of patients affected by the coronavirus. In this study, the classification of coronaviruses focuses on three classes, namely Covid, Normal, and Viral Pneumonia. This study uses a lung X-ray image dataset. In this study there are 4 folders in it, namely Scenario 1, Scenario 2, Scenario 3


Introduction
The Corona virus, also known as COVID-19, is a new virus that has emerged in Wuhan.Wuhan was exposed to the virus in December 2019.This virus has spread to various countries and cities [1].The epidemic has been designated as a pandemic because it has attacked all countries on this earth and has become a global threat [2].Viruses caused by animal or human disease are coronaviruses.Corona virus will attack the respiratory tract in humans.This virus has been found in Wuhan, China and the number of corona viruses has increased greatly to reach 103,000 patients affected by COVID-19, and 4.636 people who died from the corona virus.
The Corona virus has initial symptoms, namely coughs and colds which result in serious respiratory infections called Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS) [3].Symptoms that will appear when exposed to coronavirus are muscle pain, cough with phlegm, diarrhea, and sore throat [4].Many people know it as Coronavirus disease 2019 as COVID-19, this virus can be transmitted acutely by the novel coronavirus [5].
Various ways have been done by scientists and doctors to reduce the accelerated growth of patients affected by the coronavirus greatly increased.Efforts to fight this disease are very difficult or arguably unlikely to be cured and this virus will spread very quickly, but we can slow the growth of the increasing virus outbreak by compensating through doctor's care.In this way can reduce the number of deaths and transmission.The treatment that will be carried out is by tracing the origin of the patient, isolating the city or country, and conducting tests to detect the coronavirus.
A test that is suitable for detecting coronavirus and can make a diagnosis with possible guaranteed results is to perform a polymerase test (PCR) [6].In addition to the PCR test method and the rapid test (antibody test), another test method is antibody check which is done by checking the patient's blood, this test is useful to find out whether the patient has been exposed to the SARSCoV2 virus outbreak or not.When the patient's body responds to the presence of antibodies that have formed, then the case can indicate that the patient has been exposed to COVID-19.The accuracy obtained in the antibody test or blood test is 88.3% [7].Blood tests can be done at 6 months after the second vaccine has Therefore, the above problem requires several approaches for the automatic classification of images in human lungs digitally.Data mining techniques are useful for classifying data and entering into classes or making certain groups and the discovery process will get new data that can determine new groups and be able to distinguish classes or groups of data so that they can be used as decision making and to predict the class of objects that will be used.can be researched and determine the unknown class label [8].The method that will be done is to calculate from the lung texture to distinguish between the patient's lung modules which are malignant or not, to distinguish the patient's lung modules can use the Support Vector Machine (SVM) method which functions as a classification of malignant or not lungs [9].The SVM method is the best solution to get an accuracy level of 84.58.Another suitable method for classifying patient lung cases is by Topic Modeling.The effort to get high accuracy and accuracy requires an architecture with an increasing layer of layers.Getting accurate accuracy also adjusts to the increase in computer performance using the Graphics Processing Unit (GPU) found in Google Colab [10].
Convolutional Neural Networks (CNN) is one method of deep learning [11].Image recognition techniques using Convolutional Neural Networks (CNN) can replace the human eye due to the accuracy level of sharpness and contrast degrees in the image to get very clear results [12].CNN will have a multi-layer arrangement consisting of a polling layer.fully connected layers.The CNN screen arranges the neurons so that they have 3 dimensions [13].
Previous research was conducted to solve problems with the classification of X-rays of the lungs, including research conducted by Bambang Pilu Hartato.The study proposed a classification of lung images by applying the Convolutional Neural Network method for the detection of SARS-CoV-2 which consisted of 1345 images.The built model can achieve an accuracy of 98.69%, a sensitivity of 97.71%, and a specificity of 98.90.The researcher also wants that the system created will be further developed towards making X-ray images of the lungs with the built classification [2].Then further research conducted by Widi Hastomo and Adhitio Satyo Bayangkari Karno concluded that research to predict the type of covid disease using the three CNN architectures was very good because it had an accuracy value of more than 90%.From the results of the best accuracy among the 3, the ResNet value becomes 99%, the precision accuracy rate in each class is more than 95%.Here are the details, Covid 99%, Lung_Opacity 97%, Normal 99% and Viral_Pneumonia 99%.[14].In this study, a Covid chest x-ray diagnosis was carried out using the Convolutional Neural Network with the Resnet-152 architecture which resulted in an accuracy of 99%.It is different from the research conducted by Muhammad, Fauzan Haq, Riza Ibnu Adam because this research uses the Support Vector Machine (SVM) algorithm.Researchers will classify chest X-ray images exposed to the Covid-19 virus and then extract the characteristics of the exposed chest using the GLCM method.This method has a high level of accuracy with the accuracy obtained is 90.47%.However, the error obtained by the authors is the classification of normal image predictions and predictions for Covid-19 chest X-ray images that occur in testing [15].
Subsequent research was carried out by Yuli, Sugondo, Thomhert who proposed the detection of covid disease in x-ray images using a deep residual network which resulted in 99.00% accuracy, 98.00% precision, 95.00% recall, 97% F1.Researchers want this system to be used for a larger number of populations exposed to the virus [12].Subsequent research conducted by Fatchul Arifin, Herjuna Artanto, and Nurhasanah aimed to produce a COVID-19 early detection system based on chest X-ray images using the Convolutional Neural Network model to be applied to mobile applications.Both models in this study have succeeded in detecting the conditions of COVID-19, normal, and viral pneumonia with an overall average accuracy of 93.24% based on test results.The Single Shot Detection MobileNet V1 model can detect COVID-19 with an average accuracy of 83.7%, while the Single Shot Detection MobileNet V2 Single Model Shot Detection can detect COVID-19 with an average accuracy of 87.5%.Based on the research conducted, it can be concluded that the COVID-19 chest X-ray detection approach can be detected using the MobileNet Single Shot Detection model [16].For the research, T. Siswantining and R Parlindungan used chest dataset x-ray images in cases of Covid-19.The data used in this study were 170 image data with 130 data for training data and 40 for test data.In this study, Artificial Neural Networks, Support Vector Machine Method (SVM), and Convolutional Neural Network (CNN) were used, then applied to Stacking which is one of the Ensemble Learning methods.The results showed that the best accuracy was obtained from the Stacking model with an accuracy of 95%.Researchers want this system to be able to display chest X-ray images [17].Then research conducted by Windra Swastika proposed the detection of COVID-19 using deep learning based on the VGG16 algorithm with an accuracy of 92.86%.[1].From the conclusions of researchers 1,2,3,4,5,6, and 7, the researchers decided to make a system that can detect Covid-19 by using the Convolutional Neural Networks (CNN) Classification Algorithm.The dataset that will be used in this study is the Covid-19 Radiography dataset obtained from a site called Kaggle.Then, the dataset used in this study was 2,905 images.In the dataset there are 2 classes of data images, namely Covid, Normal, and Viral Pneumonia [18].

Research Methods
This research method has several stages, namely data collection, modeling of Convolutional Neural Networks (CNN), model training process, model testing and validation process, and model evaluation.Therefore, the following research flowchart can be seen in Figure 1.Then after the results are obtained, the next step is to create a plot to display all the results of the training process.The results are depicted in the form of a line graph.This plotting is useful to see if there is an improvement from each iteration or not.Graphs can also be used to see whether the results of the model made are overfitting or not.The plotting results can be seen in Figure 5.In Figure 5 it can be seen that in the initial iteration there was a significant increase in loss and accuracy, while in the future there was only an increase that was not much different from before.Then it can also be seen that the more iterations, the more stable the results and closer to 1.After getting the graphic results from the training that has been carried out, the next step is to evaluate the models that have been built.The performance details are then visualized in the form of a classification report through table 3. Then after the results are obtained, the next step is to create a plot to display all the results of the training process.The results are depicted in the form of a line graph.This plotting is useful to see if there is an improvement from each iteration or not.Graphs can also be used to see whether the results of the model made are overfitting or not.The plotting results can be seen in Figure 6.In Figure 4 it can be seen that in the initial iteration there was a significant increase in loss and accuracy, while in the future there was only an increase that was not much different from before.Then it can also be seen that the more iterations, the more stable the results and closer to 1.After getting the graphic results from the training that has been carried out, the next step is to evaluate the models that have been built.Then after the results are obtained, the next step is to create a plot to display all the results of the training process.The results are depicted in the form of a line graph.This plotting is useful to see if there is an improvement from each epoch or learning during model training.Graphs can also be used to see whether the results of the model made are overfitting or not.The plotting results can be seen in Figure 7.
In Figure 7 it can be seen that in the initial iteration there was a significant increase in loss and accuracy, while in the future there was only an increase that was not much different from before.Then it can also be seen that the more iterations, the more stable the results and closer to 1.After getting the graphic results from the training that has been carried out, the next step is to evaluate the models that have been built.The performance details are then visualized in the form of a classification report through Table 7.   8.After the results are obtained, a plot is made to display all the results of the training process.The results are depicted in the form of a line graph, this plotting is useful to see whether there is an increase from each iteration or not.Graphs can also be used to see whether the results of the model made are overfitting or not.The plotting results can be seen in Figure 8.
In Figure 8 it can be seen that in the initial iteration there was a significant increase in loss and accuracy, while in the future there was only an increase that was not much different from before.Then it can also be seen that the more iterations, the more stable the results and closer to 1.After getting the graphic results from the training that has been carried out, the next step is to evaluate the models that have been built.When compared with previous studies, the proposed model has increased accuracy in scenario 2 by 1.88% and in scenario 4 by 0.23%.While scenario 1 and scenario 3 experienced a decrease in accuracy of 0.82% and 5.51%, respectively.
From these results, if you pay attention to the scenario with 3 classes experiencing a decrease and an increase with 2 classes experiencing an increase in performance.Both scenarios with data that have a balanced or variable composition experience an increase and decrease.if it is analyzed why the model with 3 class data has decreased, this can occur because of differences in the model architecture with previous studies.So the performance of the model has decreased.
From this it can be seen that the model with data that has a balanced composition does not affect the performance improvement.The architecture of the model itself still has a great influence.In addition, this can also happen because, when equating the composition of the data.Composition of data in general, the data used is the smallest data.So that the selected data does not necessarily have a better quality than the discarded data.

Conclusion
Covid classification research uses deep learning that applies the Convolutional Neural Network (CNN) algorithm.So that the best model results with an accuracy of 97.87% while the lowest accuracy is 91.66%.The experimental results state that both balanced and unbalanced data do not significantly affect the performance of the binary classification model for 2 classes, but affect the categorical classification using 3 classes.Therefore, the model architecture according to the needs of the data composition and class greatly affects the level of model performance.

Figure 1 .Figure 2 .Figure 3 .
Figure 1.Research FlowIn figure1, discusses the research process to find a working dataset for this research.The second step is data splitting.For the third step, creating or designing a model and modifying the model for the Covid classification process for lung images used for model training.The fourth step is to evaluate or perform.Evaluation is the final conclusion from the model that has been compiled to obtain accurate results for the image classification formed at the time of modeling in the study.2.1.DatasetThe data used for this research is the Covid-19 Radiography image dataset.The image was obtained through direct observation by Qatar University, Doha, Qatar, and the University of Dhaka, Bangladesh along with collaborators with Pakistan and Malaysia in

Figure 4 .
Figure 4. Flow of Modeling in Scenario 2 and Scenario 4

Figure 5 .
Figure 5. Graph of Training Results Scenario Model 1

Figure 6 .
Figure 6.Graph of Training Model Results

Figure 7 .
Figure 7. Graph of Training Model Results 3.4.Scenario 4 Scenario 4 uses 3 classes, namely Covid, Normal, and Viral Pneumonia.Next, splitting the data in each class, 70% for training data and 30% for data validation.The result is training data totaling 459 image data, and for data validation totaling 198 image data divided by each class.The following can be seen the results of the accuracy of Scenario 4 in Table8.

Table 2 .
2.3.Experimental Test Against ScenariosIn an experiment on Scenario 1, Scenario 2, Scenario 3, and Scenario 4 involving 3 classes, namely the Covid, Normal, and Viral Pneumonia classes.Then the dataset will split first into 2 parts, namely Train data, and Test data in each category that will be used in this study.
Table 1.Comparison of Dataset Splitting Results Scenario 1 Scenario 2 Scenario 3 Scenario 43.Results and DiscussionsIn this study, there are 2 classes, namely Covid, and Normal.Furthermore, this study also uses 4 scenarios which are divided into 2 folders, namely, Scenario 1, Scenario 2, Scenario 3, and Scenario 4, each of which uses a different amount of image data.

Table 3 .
Comparison of Evaluation Results of Scenario Model 1 Scenario 2 uses 3 classes, namely Covid, Normal, and Viral Pneumonia.Next, splitting the data in each class, 70% for training data and 30% for data validation.The result is training data totaling 2,037 image data, and for validation data totaling 873 image data divided by each class.The following can be seen the results of the accuracy of Scenario 2 in Table4.

Table 5 .Table 5 .
The performance details Melly Damara Chaniago, Amellia Amanullah Sugiharto, Qhistina Dyah Khatulistiwa, Agus Eko, Zamah Sari Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 6 No. 2 (2022) DOI: https://doi.org/10.29207/resti.v6i2.3823Creative Commons Attribution 4.0 International License (CC BY 4.0) 195 are then visualized in the form of a classification report through Comparison of Evaluation Results of Scenario Model 2 Scenario 3 uses 2 classes, namely Covid and Normal.For the number of Covid as many as 219 images, and Normal as many as 1,341 images.Next, splitting the data, which is 70% for training and 30% for validation.The result is training data totaling 459 image data, and for validation data totaling 198 image data divided by each class.The following can be seen the results of the accuracy of Scenario 3 in Table 6.

Table 7 .
Comparison of Evaluation Results of Scenario Model 3

Table 9 .
Comparison of Evaluation Results of Scenario Model 4.
The table above shows the results of the architectural model comparison in scenario 4 by showing an accuracy value of 91.41%, 91.33% precision, 91.66% recall, and 91.66% f1-Score.The following are the results of the comparison of the total number of datasets used in each scenario, which can be seen in table10.

Table 10 .
Results of Comparison of Number of Datasets for Each Scenario

Table 11
[2]the result of the comparison of models from previous studies with the proposed model[2].In the proposed model, it can be seen that the highest average accuracy was obtained at 97.87% and the lowest average accuracy was obtained at 91.66%.The highest model performance occurs in scenario 1.Where scenario one is a model with 3 classes and the data composition is not balanced.Meanwhile, the lowest model performance occurs in scenario 3.Where scenario 3 is a model with 3 classes and a balanced data composition.

Table 11 .
Comparison of Precision Results for Each Model in Previous Research