Forecasting Pneumonia Toddler Mortality Using Comparative Model ARIMA and Multilayer Perceptron

Pneumonia is an inflammatory lung disease that causes the second largest number of deaths in Indonesia after Dengue Hemorrhagic Fever (DHF). In 2021, there was an increase in cases of 7.8% compared to the previous year, and was exacerbated by the Covid-19 pandemic. Predictive methods were needed to predict and compare the ARIMA and MLP methods, where the results of the best methods were selected for long-term forecasting. The research data used was from January 2014 – December 2021, with a total of 96 data. In choosing the best method, the basic error calculations used were Mean Absolute Deviation, Mean Squared Error, and Mean Absolute Percentage Error. This study aims to build a predictive model for the next period of pneumonia under-five mortality. These results can be used for government policy-making related to mortality prevention for the next period. The results showed that the MLP method was superior to ARIMA. Testing 28 mortality rate data using the final test result showed that the best method was MLP, with a hidden layer value of 2.2, a learning rate of 0.3, and an error percentage of 1.27%. The prediction results of the overall mortality rate of pneumonia under five in 2022 was predicted to be 136 people.


Introduction
Pneumonia is an inflammatory lung disease with the second death rate in Indonesia after Dengue Hemorrhagic Fever (DHF) [1]. Bacteria and viruses cause acute inflammation of the lung tissue called pneumonia [2]. The chance of death from pneumonia in children under five is even greater if the risk factors are 0-5 years old. Pneumonia is still a death case that has claimed the lives of toddlers to date, therefore this should get more attention from the Government, especially in the Province of Bali. Basic Health Research (Riskesdas) proves that in 2007, 15.5% of pneumonia patients died or 83 children under five died every day, making pneumonia the number 2 cause of all deaths of children under five in Indonesia. The Indonesian Basic Health Survey (IDHS) found that between 2002 and 2007, the incidence of pneumonia in children under five increased from 7.6% to 11.2%. Bali was the second highest pneumonia case in Indonesia in 2007 (11.1%), besides that Denpasar City was also the city with the fourth highest pneumonia incidence in Bali (18.73%) with the highest pneumonia coverage of 15.93% in 2012 [3].
Under-five deaths due to pneumonia are now expected to increase along with the worsening of the Covid-19 pandemic. Based on these problems, one way to monitor and predict spikes in the next period is to apply data mining techniques. The procedure for finding information contained in several data is called data mining [4] this is intended to produce better conclusions and make accurate decisions [5]. The data mining method uses a forecasting method (forecasting), which is a technique in assuming the amount of time in the future based on data in the past to know the situation in the future adulthood [6]. Forecasting is carried out to reduce the uncertainty of an event that may occur in the future period, an effort to reduce this impossibility is called the forecasting method. This study was conducted to compare the prediction results of each method used to obtain predictive results with the best accuracy values to be used for long-term forecasting of pneumonia mortality data, especially in children under five.
One of the existing research and widely used by professional experts in building predictive models is called data mining. A time series study is a quantitative method to identify patterns of past data collected in a  [7]. The most widely used method is ARIMA. One of the meanings of ARIMA is an analysis method that is carried out on certain time series data, namely a set of structured observations made, usually quantitative, over a certain period [8]. Deep learning has many types of algorithms that focus on learning multilevel (non-linear) data representations. One of the deep learning algorithms that have been used to predict time series data is the Multilayer Perceptron algorithm derived from artificial neural networks (ANN). The approach chosen for this research prediction is ARIMA and Multilayer Perceptron (MLP) where this method is one of the successful methods and applies the best algorithm.
Several studies have proven that the ARIMA and MLP methods can predict. that is compared to find out which method gives the smallest and best prediction error. The mortality data for pneumonia toddlers provided by the Denpasar General Hospital, in which the pneumonia under-five mortality data has never been used for related research, has been confirmed to be different from the data used in other studies. The process for the data preprocessing stage applies excel and the ARIMA method process is carried out with the help of Minitab19 tools, for the MLP method uses Weka as research support tools from the author. The results of the comparison are obtained by comparing the modeling generated by the ARIMA method including the estimation of the ARIMA model parameters and then testing the white noise assumption and determining the accuracy of each ARIMA model. The MLP includes estimating the architectural modeling of the MLP model to get a model with good accuracy later the best modeling of each method will be compared or compared to choose the smallest accuracy value for long-term forecasting.
In this study, we aim to applying the ARIMA and MLP comparison method is to compare the predictive results of each method used to obtain prediction results with the best accuracy value to produce the best method that can be applied in predicting the mortality rate of pneumonia under five in the future. The results of the research on forecasting the mortality rate for children under five with pneumonia can be used as a reference as a decision-making strategy for the local government related to the prevention of mortality in the next period.

Research Methods
The research method is a sequential procedure for the sake of research. The method used to assist this research with the title Forecasting Toddler Death Due to Pneumonia Using Comparative Model ARIMA and Multilayer Perceptron by going through five stages of the research process including the process of data selection, visualization, forecasting, forecasting accuracy, and forecasting results. The results of the forecasting will produce knowledge and information regarding the long-term forecasting results in forecasting the mortality rate of pneumonia under five. Figure 1 is an overview of the applied assessment processes.

Data Selection
The collection and selection of data on under-five mortality due to pneumonia were carried out to collect the data needed in data mining. The data collected is used as material in conducting research on forecasting infant mortality due to pneumonia using the ARIMA and Multilayer Perceptron comparison methods.
The data used in the study were deaths from pneumonia in children under five from January 2014 -December 2021. The data received were monthly data, so the total data received was 96 data. Data on total under-five mortality due to pneumonia were obtained in the form of death reports. The number of deaths was grouped by age and sex. This data is then used as the basis for the forecasting process. The data is a summary of Excel report data, consisting of one folder containing eight documents each containing a summary of monthly death data. The data is prepared at the data preprocessing stage, in this case, the data is divided into two, namely training data and testing data with a percentage ratio of 70%:30%. Data on the number of under-five deaths due to pneumonia predicted in this study were only based on the category of the number of deaths. Understanding the data from the collected data aims to be able to sort the data needed to carry out the next forecasting process.

Data Visualization
Data visualization is provide a general understanding of the data used in forecasting. The form of the graph is a visualization used in presenting the data in this study.
Data graphs were created using Microsoft Excel. The data graph is formed based on the number of under-five deaths due to pneumonia in the category of the number of pneumonia under-five deaths under study. The data graph is made to determine the distribution pattern of the initial data before forecasting.

Forecasting
Forecasting is an estimate of the measurement of the magnitude of the future based on data held in the past, the purpose of which is to determine future conditions [24]. Forecasting is done in the study of the number of under-five deaths due to pneumonia using two forecasting methods including ARIMA and MLP.
ARIMA prediction in this study uses several testing steps, including model identification, evaluation, and diagnostics. The identification stage is mainly aimed at seeing patterns in the data, especially the results of autocorrelation and partial autocorrelation, as well as seeing whether the original data needs to be distinguished or not. The evaluation and diagnostic steps of this process are carried out simultaneously with the Minitab19 prediction tool, the model is suggested by comparing the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF), which creates  [25]. Below is a flow chart of the ARIMA model, which is shown in Figure 2.  Figure 2 is the process flow of the ARIMA algorithm, where the process flow ARIMA algorithm describes how the ARIMA algorithm works. The process flow of the ARIMA algorithm begins by reading the dataset on mortality due to pneumonia that has been saved using the .csv file format and then the data is identified using a data plot to determine the distribution of the data and visualize the data. The identification stage has been successfully carried out, the next stage is to test the stationarity of the data. The stationarity test is divided into two stages, the stationarity test stage for variance (variance) and the stationarity test stage for the mean (mean). The data that results from the stationarity test for variance can be viewed using a time series plot that fluctuates from period to period, so the un-stationary can be eliminated by transforming the data to stabilize the variance. Stationary non-stationary data in variation can be done by transforming power by looking at what is called the transformation parameter (optimal lambda). The following is the value of (optimal lambda) which is commonly used in forming the transformation in Table 1 [26]. where Z_t is the initial data. If the data has been stationary after the transformation with optimal lambda (=1) then the data can be continued in the next stage.
The data that has been tested for stationary data is used to identify the resulting model, if the test results are valid then the stages can be continued with the selection of the best ARIMA model, if the test results are valid then the stages can be continued with the selection of the best ARIMA model otherwise if the test results are not valid then the parameter estimation test other models are carried out to obtain valid test results. The ARIMA model that passed the test and the best can be used for prediction, predictions are made using the best ARIMA model to get good prediction results. The prediction results are received, then the data mining process using the ARIMA algorithm is complete.
MLP is a pure part of perceptron modeling which was discovered by Rosenblatt in 1950. MLP is the most commonly used neural network model. A multilayer perceptron has one or more hidden layers between the input and output layers [27]. An algorithm that has often been applied in multilayer perceptron training is called backpropagation. This algorithm is carried out in two steps, namely forward computing to calculate the error/loss function that exists in the actual and target outputs, and backward computing to propagate the error back as a correction of sympathetic weights for all existing neurons [28].
Multilayer Perceptron is trained with the backpropagation algorithm, which is used in data mining. Multilayer Perceptron has characteristics including, having several inputs, having one or more hidden layers with several units, using a sigmoid activation function on the hidden layer, and having connectivity between the input layer and the first hidden layer, between the undisclosed layer, and between the last hidden layer and the hidden layer, output layer.
The training of the Multilayer Perceptron method with Backpropagation consists of three stages including, Phase I: forward propagation in which each input receives and receives a signal and forwards it to each hidden layer. The hidden layer calculates its activation and sends a signal to the output. The output will then calculate the activation value in response to the input is given to the network, Phase II: backpropagation and weight changes where each output will update its bias and weight values with each hidden. Hidden will also affect the value of the bias and its weight with each input [29]. Figure 3 below is a flow chart of the MLP in the study. The picture above is the flow of stages of the MLP algorithm, where the process flow of the MLP Algorithm describes the workings of the MLP algorithm. The flow of the MLP algorithm begins by separating the infant pneumonia mortality rate data into training data and testing data. The training data is used to determine the best MLP modeling and the test data is used to test the resulting model against the training data. After the data is split, the data is normalized, and data normalization or data scaling is called between 0 to 1 using Equation 1 [30].
where x is the denormalized data or initial data, x ′ is the data resulting from normalization, a is the lowest data value, and b is the highest data value.  [30].
where is the actual value in the t-time range, the value is the predicted value in the t-time range, and the value of is the number of predictions involved.
The lower the prediction accuracy value obtained by the three predictive measuring instruments, the better the prediction method applied.

Results and Discussions
The data used in the study is a collection of data on the mortality rate of pneumonia under five in the form of an excel file. The data received is monthly data, so the total data received is 96 data consisting of two fields, namely month/year which states the time or month of death, and the number of deaths which states the total number of deaths that occurred in that month, starting from January 2014 until December 2021. Normalization is carried out on the data to minimize errors by changing the actual data into values in the 0-1 interval range. Table 2 shows the details of the distribution of data on the mortality rate of under-five pneumonia. The data is divided into training and testing data. Training data is data from January 2014 to July 2019. Testing data is data from August 2019 to December 2021.

Data Flow
The data flow in this study was used to determine the distribution of data and visualization or distribution of data on under-five mortality due to pneumonia to determine the distribution of data and visualization of under-five mortality in pneumonia under-five mortality data.
The data flow is also used to determine the correct forecasting method. The graph of the monthly dataset of deaths due to pneumonia in children under five from 2014 to 2021 each month is shown in Figure 4. The illustration above shows that the data plot of the under-five mortality rate in 2014 -2021 displayed in this study is a recurring or seasonal mortality data plot every month. The data outage, which is called the amount of data that is closely monitored with other data, appears in the graph in September 2021 with a total death of 17 people, while the total death rate for pneumonia under-five children tends to decrease at the beginning of the year, while the total death rate for pneumonia under five tends to decrease at the beginning of the year. The information that can be concluded in the data plot is that at the beginning of the year pneumonia under-five mortality tends to decrease, and in the middle, to the end of the year pneumonia underfive mortality tends to increase.

ARIMA Models
ARIMA modeling was estimated using the Minitab19 tool which resulted in the estimation of ARIMA modeling parameters described in Table 3.   Table 5.

MLP Models
The MLP modeling was estimated using the WEKA tools which resulted in the estimation of the MLP modeling parameters described in Table 6.  Figure 5. Forecasting results using the MLP method are superior to ARIMA with the resulting error value of 1.27.
Forecasting that was done using the pneumonia underfive mortality dataset on the training data resulted in a data pattern predicting the number of under-five deaths due to pneumonia with the results of the training being predicted to tend to be the same as the data pattern generated in the predictive data which almost followed the actual data pattern in the actual data. This is due to seasonal factors that lead to forecasting results using training datasets according to current conditions or conditions that are happening at this time. It is proven in the comparison results with actual data, the predictions produced are not too far from the actual results. The prediction results of the training data are used to determine the resulting prediction model, which will later be used to test the infant mortality rate data set against test data to obtain long-term predictions of pneumonia mortality in children under five. The MLP (Multilayer Perceptron) method will then be used to test the test data using the training model obtained from the training data.

ARIMA and MLP Analysis Results
The results of the analysis of mortality datasets with MLP precision were better than ARIMA with MAD, MSE, and MAPE values of 0.11, 0.16, and 1.27. The MLP error value is proven to be lower than ARIMA, which is 1.27%. Table 7 shows the results of the comparison of the best model accuracy.  Table 7 shows the results of the accuracy of forecasting the number of under-five deaths from pneumonia. The results show that the forecasting method that has the lowest MSE value is the MLP forecasting method with a value of 0.16. The MLP method also has the lowest MAD and MAPE values, namely 0.11 and 1.27%. The results of this accuracy are then used as the basis for selecting the MLP forecasting method, which is determined as the best forecasting method for the number of under-five deaths due to pneumonia.

Long-Term Forecasting Results
The best MLP modeling with a hidden layer level of 2.2 and a learning rate of 0.3 is the best modeling because it has the smallest MAD and MSE values compared to other models. This model is then used to forecast pneumonia under-five mortality for the next 12 months or a decade from January 2022 to December 2022. Forecasting results are shown in Table 8. Based on Table 8, shows that the largest number of pneumonia under-five deaths occurred in October with a total of 16 deaths, the total number of pneumonia under-five deaths in 2022 is predicted to increase by 136 people. The total number of pneumonia under-five deaths in 2022 is predicted to increase by 136 people. Forecasting in February -May 2022 has decreased with a total of 33 deaths, in June -October 2022 there has been an increase of a total of 67 deaths and in November -December has decreased with a total of 27 deaths. The results of this forecast indicate that the number of under-five deaths with pneumonia has increased following the data in the previous year, as evidenced by the pattern of the data increase shown which at the end of the year became the highest peak of the data produced. The following is a visualization of the longterm forecasting results shown in Figure 6.

Conclusion
Based on the results of experiments carried out according to the ARIMA method, the ARIMA (0.1.2) and ARIMA (1.1.1) models passed the parameter significance testing stage, which met the overall value of the variable 0.05, but the lowest variable value and the lowest forecasting accuracy value were found on the (1,1,1) model. Based on these results, ARIMA (1,1,1) modeling is the best model that allows it to be used to predict mortality in children under five due to pneumonia. The results of the tests carried out using the Multilayer Perceptron method, the model architecture obtained by testing hidden layers 2. Forecasting the long-term pneumonia under-five mortality rate shows that the number of pneumonia under-five deaths in 2022 is predicted to increase by 136 people.
The use of comparative forecasting methods ARIMA (Autoregressive Integrated Moving Average) and MLP (Multilayer Perceptron) have never been used to predict the mortality rate of pneumonia under five, it's just that the two methods have been used to compare using datasets or variables studied differently from research on forecasting mortality rates pneumonia under five, other differences were found in the year of the study, as well as the research instrument used. The results of this research for forecasting the mortality rate of children under five with pneumonia can be used as policy decisions for the local government regarding the prevention of mortality in the following years.
Further research can be done by developing forecasts using the latest datasets, and the data used can be more detailed, such as using daily or weekly data because of the limited data obtained in this study, so this can be used as a recommendation for other researchers. The development of forecasting results can be done by conducting research on a larger scale, such as when conducting case studies in a province. Other developments can be made by adding other death characteristics. The use of forecasting methods used must be more diverse to obtain prediction results with better accuracy values.