Herbal Leaves Classification Based on Leaf Image Using CNN Architecture Model VGG16

Herbal leaves are a type that is often used by people in the health sector. The problem faced is the lack of knowledge about the types of herbal leaves and the difficulty of distinguishing the types of herbal leaves for ordinary people who do not understand plants. If any type of plant is used, it will have a negative impact on health. Automatic classification with the help of technology will reduce the risk of misidentification of herbal leaf types. To make identification, a precise and accurate herbal leaf detection process is needed. This research aims to facilitate the classification model of herbal leaf images with a higher accuracy value than previous research. Therefore, the proposed method in this classification process is one of the Transfer Learning methods, namely Convolutional Neural Network (CNN) with a pretrained VGG16 model. This research uses a dataset of herbal leaves with a total of 10 classes: Belimbing Wuluh, Jambu Biji, Jeruk


Introduction
Herbal plants have many benefits for human life apart from being foodstuffs, oxygen providers, and others.Herbal plants can also be specifically utilized for medical therapy in the health sector [1].One example is the utilization of bay leaves (Eugenia Polyantha Wight) as a food flavoring spices and herbal medicines for body health [2].Herbal plants can be used as medicinal plants or ornamental plants, this depends on the utilization itself.Herbal plants are identified through observations that begin with human observations.Herbal plants have a function to prevent and cure diseases [3].Herbal plants are often used as family medicinal plants because they have a high potential to grow into family medicinal plants.Various benefits are obtained from family herbal plants, such as improving nutrition, increasing income, greening the environment, and fulfilling other daily life needs.Related research that has been conducted from the results of interviews is from 16 types of herbal plants, it turns out that many people are not right in utilizing the properties of herbal plants.In the treatment of diseases, there are still misconceptions about the efficacy of herbal plants [1].Based on research conducted [4], around 80% of people depend on herbal plants for human health.
To find out the types of herbal plants, several things can be considered, such as herbal plant pattern recognition, plant shape, plant texture, and plant structure characteristics [5].The part of the herbal plant that is often utilized in medicinal therapy is the leaf [6].Classification of herbal leaves is generally only done based on observations of leaf shape and color.Leaves have a very important role in herbal plants.In addition, leaves are also easily available compared to other parts such as roots in herbal plants.Because of the many types of herbal plants, it will be difficult to distinguish them if identification is carried out based on the shape and color of the leaves with the direct eye by ordinary people.To be able to distinguish between types of herbal plants requires information and knowledge in this field.This manual identification process takes a long time and special knowledge [7].It takes an expert or experienced person to be able to correctly classify several types of herbal leaves [8].Misclassification can also result in errors in the composition of herbal leaf concoctions combined with other types of leaves for medicinal purposes.Therefore, a media is needed to be able to classify various types of herbal plants effectively and accurately with the aim of helping people identify herbal leaves without special knowledge.One solution is to provide information on the classification of herbal leaf to the public by using technology [7].
One of the artificial intelligence algorithms that can classify an object through machine learning is Deep Learning Algorithm.Deep Learning is the development of Machine Learning.Image or digital image processing often uses Deep Learning.One of the utilizations of Deep Learning is image processing.With the image processing system, it can help classify objects while processing a lot of data, as well as quickly and accurately [9].The reason for using the Deep Learning algorithm is to optimize the performance of unstructured data.An example of a Deep Learning Algorithm that is often used is Convolutional Neural Network (CNN).One of the derivatives of Multilayer Perceptron (MPL), Convolutional Neural Network (CNN) is a method designed to process data in twodimensional form, such as sound or images [10].
Deep Learning requires many datasets to get good results.In related research, the number of parameters has been reduced by taking datasets that have been trained so that they can classify new datasets without training data from scratch, this method is called Transfer Learning [11].Transfer Learning is a method that utilizes a Convolutional Neural Network (CNN) model with a pretrained model.It does not require training data from scratch because the weights from the trained model will be applied to the new dataset [12].The classification process using the Transfer Learning method can improve the performance of other Transfer Learning models and strategies by applying the end-toend Convolutional Neural Network (CNN) model [13].
Research related to the classification of herbal leaf types by utilizing technology has developed with various models and results obtained.Research using highresolution leaf images using the Convolutional Neural Network (CNN) method has obtained an accuracy value on the testing dataset of 82%.The research was used to recognize objects on 5 (five) types of plants.However, there is still a prediction error in the banana plant class, because it has a geometric correction that is almost the same as other types of plants [14].The detection method using Convolutional Neural Network (CNN) Deep Learning with the use of 7 Convolutional Layers has obtained research results in the form of an accuracy value between 80 -100% with 80 images of testing data and getting an average accuracy of 0.90296 using testing data [15].
Other research has implemented Deep Learning in the classification of plant species in an android-base application and obtained an accuracy value on the validation dataset of 86% [16].Research using Deep Learning Algorithms is also found in leaf image identification using the Convolutional Neural Network (CNN) method which gets an average accuracy of 85% and 90% using 40 image tests [10].Different research has identified 54 images of three types of Ficus plants utilizing Artificial Neural Network (ANN) and Support Vector Machine (SVM) [17].
The Convolutional Neural Network (CNN) method is also utilized in research that is used for the disease detection process on leaves, namely potatoes.Get good results at 95% accuracy on training data and 94% on validation data when reaching the 10th epoch with a batch size of 20 [18].In addition, the same thing is also done in the agricultural industry, namely the Convolutional Neural Network (CNN) model gets an accuracy value of 85% on testing data and 98.75% on training data from classification results carried out on digital images of spices and herbs [19].The Convolutional Neural Network (CNN) method can classify well, this is obtained from research conducted on succulent plants.The accuracy obtained is quite high, with an accuracy result of 93% of the testing data with as many as 500 datasets and 88% of internet data with a comparison of grayscale model training and color datasets [20].Optimization of the Convolutional Neural Network (CNN)-Transfer Learning model performed on the image by reducing the image matrix without reducing information gets an accuracy value of 90.5% [21].
Research by comparing datasets and utilizing the Support Vector Machine (SVM) method in plant classification based on ten classifications for classification of 240 Chinese herbal leaves has been done [22].The use of Convolutional Neural Network (CNN) in the classification of magnolia plants has resulted in an accuracy value on training data of 99.37% and testing data accuracy of 95.89% [23].Based on research relevant to leaf classification that utilizes Artificial Intelligence information technology, it can be concluded that Artificial Intelligence has an approach by utilizing expert systems with good and successful results for classic problems [24].An example of the application of Deep Learning architecture is Convolutional Neural Network (CNN) which can reduce the dimensions of the dataset owned by not eliminating the characteristics or features in the dataset [25] namely making a model by applying the Convolutional Neural Network (CNN) method which is considered more effective in classifying an object [27].The architecture model used in this research is VGG16.This is done to get a higher accuracy value based on previous research reference journal [14] and is expected to be able to get efficient classification results.The dataset used in this research is sourced from Indonesian Herb Leaf Dataset 3500 1 .Through the classification of herbal leaves in this research, it is hoped that it can add to the application of herbal leaf classification technology so that it can be implemented in applications that are in accordance with the needs of the community.The first stage is inputting herbal leaf images for processing.Next, dataset preprocessing is carried out in the form of training dataset augmentation which is then entered into the proposed VGG16-CNN method, in this stage the training data will be processed through Convolutional Layer, Pooling Layer, Flatten Layer, and Dense Layer.The last stage is the stage of evaluating the classification performance of the proposed method that has been trained by testing using the testing dataset.

Dataset
Referring to the reference research [14], the composition of the dataset is divided by the percentage  Layer.Each layer produces an output shape and parameter value.The difference with the previous [14] and proposed is a more complex model architecture that uses a pretrained model VGG16 architecture, where the output shape and parameter values are higher.In the proposed model, the input image dimensions are changed to 150 x 150, then the Filter Layer and convolution process are carried out.After convolution, it is continued with the Pooling Layer and if it is finished it will continue the next convolution, so that the dimensions will change and continue the Pooling Layer.Furthermore, the Convolution and Pooling Layer process will repeat again.After the Convolutional and Pooling Layer is completed at the final stage, it will enter the Flatten Layer which will process the last Pooling Layer result.The results obtained from the Flatten Layer will be inserted one by one in the Dense Layer.The Dense Layer in the last layer amounts to 10 adjusting to the number of classes classified.

Data Augmentation
The data augmentation process aims to add images to the dataset.This process aims to prevent or reduce overfitting of the model and improve accuracy in classification [14].Augmentation in this research uses ImageDataGenerator as a class of preprocessing functions derived from libraries that can be accessed on Tensorflow, with the values of rescale = 1./255, shear_range = 0.3, zoom_range = 0.3, rotation_range = 30, horizontal_flip = True, vertical_flip = True and fill_mode = 'nearest'.
If without the use of augmentation, the accuracy value obtained is not as good as the use of augmentation, and overfitting of the model can occur.This can be seen in the comparison of accuracy values in Table 4.

Training and Testing
This research implements a callback function that is applied to the model training process.This callback has a function to stop the model training process when val_accuracy reaches the specified value.Storage is done every epoch completed if the val_accuracy matrix value has increased.The use of this callback is also to recover the weight learned from the best epoch as the final weight of the model.This callback is also important for scheduling the learning speed, because the learning speed can cause problems.The factor that can cause this problem is as the number of epochs increases.
Tests will be carried out in several scenarios such as augmented and un-augmented data, as well as evaluation of classification results in each class.The various scenarios are expected to prove the reliability of the proposed model.

Results and Discussions
This research uses ten types of herbal plant labels whose dataset properties are unstructured data including Belimbing Wuluh, Jambu Biji, Jeruk Nipis, Kemangi, the division ratio being 70% training data, 20% validation data and 10% testing data.Of the ten labels used as datasets have different forms of leaf characteristics that are similar and difficult to distinguish by lay people who do not have knowledge in the field of botany such as for example Sirih leaf with Jambu Biji, Belimbing Wuluh leaf with Kemangi, and Seledri leaf with Pepaya.Therefore, this research uses Deep Learning algorithms to optimize performance in classifying unstructured data to be manipulated more effectively and accurately.

Data Processing and Augmentation Results
Preprocessing step carried out is splitting the dataset, which divides the "Indonesian Herb Leaf Dataset 3500" dataset into three folders, namely training, validation, and test data.The results of splitting each group in Table I.The next step is data processing which is carried out using augmentation on training and validation data.This augmentation also enforces a change in image pixel size in training and validation data to a size of 150 x 150 which is also a criterion for using the VGG16 Transfer Learning architecture [28].
Data processing by creating a callback function for the model training process.The callback function uses features from the Tensorflow library in the Keras module, in this model training using the vall_accuracy matrix value as a parameter for the value of stopping the model training process.Furthermore, the model will be trained with 100 epochs.In the main reference of this research [14] the accuracy results are different, with the accuracy value of the proposed method higher than the previous research, for more details can be seen in Table 4.In Table 5 it can be seen that using the augmentation process in this research can increase the accuracy value compared to without using the augmentation process.The comparison of accuracy values is 97% with augmentation and 96% without augmentation process.

Evaluation Result Chart
The results of the Precision, Recall, and F1-Score of the model we designed by applying augmentation get a value of 0.97.Precision is the ratio between the accurately predicted True Positive (TP) and the total number of positive predicted data.Recall is the comparison between True Positive (TP) and the total amount of data that is actually positive, while F1-Score is the average comparison between precision and recall [25].The value of 0.97 indicates that the prediction value is 97%, which if it is 97%, then the model can predict the image correctly without any errors and vice versa if it is close to 0%, the model fails n classification.When using the model designed in this research, it is proven to be able to increase the accuracy value of image classification.For more details, see Table 6.It can be concluded that the implementation of the method used in this research is able to get very good results with the accuracy value of the testing data higher than the main reference journal [14] with the different types of herbal leaves used.

Figure 1 .
Figure 1.Block Diagram Based on the block diagram design in Figure 1, there are three main stages in this research, namely input, process, and output.The dataset categorized into 10 classes will first be divided into 3 parts, namely: dataset for training, validation, and testing.

Figure 3 .
Figure 3. Model Accuracy Chart Figure 3 shows the accuracy graph of the training model using data from training and validation data, which results in training accuracy and validation accuracy and has an index from 0 to 1 as a measurement value.On the graph, it can be seen the blue line as training and green line as validation.The training data graph has a point of 0.9673 or equivalent to 96.73% at the 100th epoch.While the accuracy point obtained from validation data is more than 0.9563 or equivalent to 95.63%.

Figure 4 .
Figure 4. Loss Model Chart Figure 4 shows the loss values of the training and validation data.It can be seen that the validation data value is a green line and training data is a blue line.In this research, the training loss value from the training process gets a value of 0.0975 or equivalent to 9.75% and the validation loss gets a value of 0.1647 or equivalent to 16.47%.3.3.Accuracy, Recall, and F1-Score Making Accuracy, Recall, and F1-Score as well as testing the results of models that have been made on machine learning.The test results of the training data model are stored in the history variable.By using a model evaluation through Classification Report to find out how much percentage of the model successfully classifies all images in the testing data [29].To measure the performance of classification problem, it can be seen through the comparison of the combination of predicted and actual values presented in the form of a Confusion Matrix as in Figure 5.This Confusion Matrix displays the accuracy and precision values of the testing data.

Table 1 .
70% of training data, 20% of validation data, and 10% of testing data.Details of the dataset distribution can be seen in Table 1.Group and Data Distribution Sample Data from Herbal Leaf Datasets Bella Dwi Mardiana, Wahyu Budi Utomo, Ulfah Nur Oktaviana, Galih Wasis Wicaksono, Agus Eko Minarno Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. 7 No. 1 (2023) DOI: https://doi.org/10.29207/resti.v7i1.4550Creative Commons Attribution 4.0 International License (CC BY 4.0) 23 2.2.Model Architecture This research applies the Transfer Learning method in the design of the proposed model.The model architecture used is VGG16, with the addition of a Fully Connected Layer, namely the Output Layer to adjust to the number of classified dataset classes.The model architecture designed on the proposed model is more complex than the model in previous research [14].

Table 2 .
Model Architecture

Table 3 .
Proposed Model Architecture

Table 6
The application of the proposed method with the Transfer Learning method using the VGG16 pretrained model and the augmentation process on the training dataset used for herbal leaf image classification can recognize and detect the type of herbal leaves correctly.The entire dataset used in this research consists of a collection of herbal leaf datasets that are transformed into training data, validation data, and testing data.The herbal leaf classification process is influenced by the clarity of the leaf image during testing.Another factor that affects the accuracy value in classification is the amount of training data.The more training data used, the model will learn a lot and get better results, but the time required will be longer too.The results of the classification that implements the Convolutional Neural Network (CNN) method for image classification using augmentation (Image Data Generator) and the addition of layers, namely Fully Connected Layer that is Dense Layer type in the proposed model, can increase the accuracy value of training data to reach 96.73% and 97% accuracy using testing data by providing values up to the 100th epoch.If the augmentation process is not used, it can reduce the accuracy value of the testing data to 96%.