JURNAL RESTI Hierarchical Clustering and Deep Learning for Short-Term Load Forecasting with Influenced Factors

Stable and reliable electricity is one of the essential things that must be maintained by the transmission system operator (TSO). That can be achieved when the TSO is able to set the balance between demand and production. To maintain the balance between production and demand, TSO should estimate how much demand must be served. In order to do that, the next day short-term load forecasting is an essential step that TSO should be done. Generally, load forecasting can be done through conventional techniques such as least square, time series, etc. However, this method has been sought over time as the electricity demand is increasing significantly over the years. Hence, this paper proposed another approach for short-term load forecasting using Deep Neural Networks, widely known as Long Short-Term Memory (LSTM). In addition, this paper clusters historical electrical loads to obtain similar patterns into several clusters before forecasting. We also explored other influence factors in the observed days, such as weather conditions and the human activity cycle represented by holidays, in a neural network-based classification model to predict the targeted clusters of electrical loads. East Java sub-system is used as the test system to investigate the efficacy of the proposed load forecasting method. From the simulation results, it is found that the proposed method could provide a better forecast on all indicators compared to the conventional method, as indicated by MaxAPE and MAPE are around 4,91% and 2,02%, while the RMSE is 112,08 MW.


Introduction
In the 4.0 industrial revolution, power systems should be able to provide both stable and reliable electricity. To get stable and reliable electricity Transmission System Operator (TSO) should maintain the balance between electricity production and demand [1]. If TSO is failed to maintain the balance between production and demand, the possibility of the system experiencing unstable conditions is high [1]. In addition, the penalties from the consumers could have happened if the unstable condition emerged in the system. Hence, it is important to forecast the load demand from the customer side so that the TSO can produce the optimal electricity [2]- [4].
In power systems, load forecasting can be divided into several categories depending on how long the prediction takes. Long-term load forecasting is used to predict the peak load for the next ten years. Mediumterm load forecasting is used to predict the load for a month or year period. Short-term load forecasting is used when TSO wants to predict the load demand between half an hour to weekly. While very short-term load forecasting is used when TSO wants to predict the load demand in less than half an hour [5], [6]. Among them, short-term load forecasting is more essential as they play an important role in the energy management system's real-time control and security functions. The accurate short-term load forecasting (STLF) could provide operational cost savings and safe conditions that allow utilities to process production resources to optimize energy prices and trade-offs with producers and consumers. Short-term load forecasts for the next 0.5 -24 hours are important for the day-to-day operation of electric utilities.
Generally, short-term load forecasting can be solved by using a conventional method such as extrapolation, correlations, least squares, stochastic time series (ARMA and ARIMA), and an expert system. The application of the extrapolation method for short-term load forecasting is reported in [7]. From the results, it was found that extrapolation can be used to predict how much load demand must be handled by TSO. Srinivasan et al. proposed an application of the multiple correlations model for solving short-term load forecasting problems, as reported in [8]. It was found that short-term load forecasting problems can be handled properly by using correlations approaches. The research effort in [9] proposed applications of least square for short-term load forecasting. It was observed that using the concept proposed in [9] the TSO could maintain the balance between production and electricity demand. The application of ARMA model identification for short-term load forecasting is reported in [10]. In [11], non-gaussian process consideration is included in the ARMA model for optimally predicting the load demand of the power systems. From the results, it was found that the method can be used to forecast the load demand in power systems. ARIMA model approach for load forecasting is used in the following paper [12]. While the application of an expert system for short-term load forecasting is reported in [13]. Although the conventional method can be used to solve the short-term load forecasting problem, the concept is out of date in the nowadays condition [14].
Electricity load demand has been increasing significantly over the past few years. In addition, the power system is also undergone significant changes over the years with high penetration of intermittent inverter-based power plants (photovoltaic generation and wind generation [1]). Hence, conventional approaches are not enough to predict the load demand in uncertain conditions. To handle this problem, artificial intelligence approaches are essential to solve short-term load forecasting in modern power systems. Among numerous types of artificial intelligence methods Deep Neural Network or widely known as Long Short-Term Memory (LSTM) model, is becoming favorable. The application of LSTM for predicting the circulation water pump bearing condition is reported in [15]. In [15], a coal fire power plant is used as the test system of water pump bearing conditions. From the results, it is found that LSTM can be used to predict the condition of water pump bearing optimally. Dalgkitsis et al. show the successful application of LSTM for cellular network traffic forecasting, as reported in [16]. The research effort in [17] shows the application of LSTM for predicting the load demand in the residential area. It was noticeable that LSTM can be used as a method for short-term residential load forecasting optimally.
From the literature review, it is noticeable that LSTM is a highly recommended method for short-term load forecasting. However, to make the LSTM perform more accurately, additional approach such as grouping similar load profiles is essential. The application of daily temperature (weather condition) grouping in load forecasting is reported in [18]. From the results, it is noticeable that by grouping the load based on the daily weather condition profiles, the forecasting is more accurate than without grouping. Quilumba et al. proposed a grouping method based on customer behavior for load forecasting, as reported in [19]. To understand the behavior of the customer, the smart meter is used to capture all the important data. From the result, it is observed that by using this concept, load forecasting can be done more accurately. From the above research, it can be concluded that to increase the accuracy of load forecasting, clustering the load is important. Researchers in [20] proposed an optimal method for clustering data using Analytical Hierarchy Clustering (AHC) method. It is noticeable that by using this method, the data can be optimally clustered.
This paper proposed an application of long short-term memory for short-term load forecasting. To get optimal and accurate results analytical hierarchy clustering method is also considered in the research. The rest of the paper is organized as follows: Chapter 2 focuses on the research method of the paper. Results and discussions are thoroughly presented in Chapter 3. Chapter 4 highlighted the conclusions and future works of this research.

Analytical Hierarchical Clustering
Agglomerative AHC is a clustering method that is carried out on a bottom-up basis by combining a number of scattered data into a cluster. The AHC method uses several choices of algorithms in performing clustering, namely single linkage, complete linkage, and average linkage [20], [21].
Single linkage is an algorithm to combine two clusters (e.g., clusters A and B) into one cluster (AB) by considering the smallest distance between clusters, denoted by D = {dAB}. Calculation of the distance between clusters can use a variety of methods but generally use the excluded distance. Next, the distance between clusters is calculated, including the new cluster AB and chooses the shortest distance. The distance calculation process is repeated until the desired number of clusters or the highest hierarchical tree is obtained, namely one cluster. Single linkage can be denoted as in Eq. 1.
A complete linkage is a grouping approach that is carried out by calculating the largest distance between data or clusters. The process begins by determining the distance between the largest clusters D = {ij}, then they are combined into one cluster (e.g., clusters A and B become clusters AB). The distance calculation process is repeated with a new cluster (cluster AB), and the largest distance is recalculated to be a candidate for a new cluster. This process is repeated until the desired number of clusters or the highest hierarchical tree is obtained, namely one cluster. Complete linkage can be denoted as in Eq.2.
In determining distance, the commonly used calculations include Euclidian, Minkowski/Manhattan, Mahalanobis, etc. In this study, the determination of distance using the Euclidian method was chosen so that the mapping of an object was carried out for homogeneity.

Neural Network Classification
Neural Network is one approach to classifying data. The basic idea of this method comes from the ability of the human brain to determine or make a decision. Based on this, the architectural form in this method consists of an input layer, process layer, and output layer. The use of this method varies greatly, as the number of layers used will be greatly influenced by the level of complexity of the data or the problem to be solved. The higher the level of complexity of the problem, the more layers used. This layer is supported by several processes in it, such as the initialization process, activation, weight training, and the number of iterations. The initialization process is the process of determining the data into neurons in the input layer (Xn). This process also defines the number of iterations and the initial weight assigned to each data or neuron to another layer (W1, W2, … Wn). The activation process will play a role in connecting neurons and layers to the maximum limit of iterations or the problem is solved, where the activation function can be written Y=f(net) and net = X1W1 + X2W2 + … + XnWn [22], [23].

Forecasting Modelling
Long Short-Term Memory (LSTM) is a Deep Learning Recurrent Neural Network (DLRNN) development. In general, this method is used to predict Sequential Values or forecasting. This method is applied in many fields such as cellular network [16], power generation [24], bearing conditions [15], Load Forecasting [6], [25], etc. LSTM was developed to solve the RNN problem [26], namely the loss of important information at the beginning of knowledge; if the sequence processed is long enough (forward propagation) and the gradient vanishing problem (backward propagation), the gradient value is very small, so it does not contribute to changes in weight.
The basic concept of LSTM is to combine new knowledge (Short Term Memory) with old knowledge (Long Term Memory) by deleting or updating that knowledge. Which consists of a cell state and three main gates: forget gate, input gate, and output gate. Cell state provides comprehensive information (from the beginning of the data to the end) at the stage of model formation. The information on the cell state will be influenced by the existing gates, such as forget gates and input gates. Forget gate aims to eliminate information that is less relevant to the new input data. The checking process uses a sigmoid function from the previous output data with the current data to produce values ranging from zero to one.
This process is continued by performing multiplication operations of new knowledge with old knowledge. The closer to zero, the information will be forgotten and vice versa. Input Gate aims to update the state of the Cell, whether the existing information values need to be updated or maintained. This gate is started by performing sigmoid and tan-h operations on the previous and current output data. The results of the two operations are then processed by multiplying them. The results of the sigmoid operation on this gate will determine which new information from the operation is important to store. After the input gate processing results are obtained, it is added to the Forget Gate processing results to get the latest information from the inputs. This result will be used to get the Output Gate value by operating the input data with the latest knowledge. Output Gate results will be stored and used for processing to the next Cell as data from the previous output [6], [27].

Results and Discussions
Section 1 explains that this study uses two reference datasets to obtain the predicted value of the Electrical Load to increase the efficiency of the use of the required resources. There are two types of these datasets, the electric Load dataset (Conventional) that has been obtained and the electrical Load dataset supported by influence factors such as weather conditions and human activity that is preceded by the clustering process (Proposed). In general, both approaches will be processed using LSTM.
The data used in this study consists of Electricity Load data, which is load data for three years every 30 minutes, as shown in Fig. 1, and Influenced Factor data consisting of weather data and human activity data represented by day-off / holiday data. The electricity load data is data on the East Java sub-system obtained from the PLN Group. Weather data used as features in this study include Dew Point, Temperature, Humidity, Wind Speed, and Pressure. The weather data was obtained from 6 Amateur Weather Stations (AWS) spread over the province, which were obtained from the

Conventional Method
The first form of prediction is done by using electrical load data that has been collected. The dataset has numerical features and has a very different minimum and maximum range. This condition makes the modeling process cannot be carried out optimally. In this study, min-max normalization was carried out. Normalization is done by setting a min-max limit with values that have occurred in each type of feature and rounded up for max and down for a min. Based on this normalization processing, the Minimum value in the Electrical Load data is set to 2500, and the Maximum value is set to 6100. Furthermore, the data is used to predict the Electricity Load for the next seven days using LSTM.
In this study, two types of datasets are used, so it is necessary to determine the architecture used as a measurable form of comparison. In the first dataset, 18 types of LSTM architectural modeling were carried out to get the best LSTM architecture by testing ten times in each architectural model with 1000 iterations. The 18 types of architectural models used are presented in Table 1. there is the combination of the LSTM layer with various numbers of hidden units dan the Dropout Layer coefficient. The best architectural modeling value is determined by calculating the Maximum Absolute Percentage Error (MaxAPE) value from each test and accumulated by the average value approach. Forecasting is done using limited training data, namely four days or 4 x 48 data, to predict the next one day or 1 x 48 data, which is sufficient according to the author's previous research [6], [25] The results of the LSTM architectural modeling test are presented in Table 2. on these results, it was found that the best LSTM architecture model lies in model number 9, with the smallest MaxAPE average value of 7,061. This is supported by the relatively small standard deviation value so that when combined with the MaxAPE average value obtained, model number 9 still shows the best MaxAPE average performance value. The selected architectural model is presented in Table  1. with a number of layers of 7.
The following process is to test using data testing. Testing data is obtained by taking part of the last data from the dataset, which is seven days, and the rest as training data.

Proposed Method
In the second dataset, the electrical load's value prediction is compiled with influenced factors such as weather data and holiday data. Each data is preprocessed according to the characteristics of the data. Similar to the conventional method, the electrical load data is normalized using Min-Max. As for the weather data, it was found that the data that had been obtained still contained missing values. The missing value is solved by using the average value from 6 different AWS datasets. Then normalization is performed using the Min-Max method. While the human activity data, normalization is not carried out because the data is categorical.
The Training dataset will be processed according to the proposed method as described in Fig. 2 Electrical load data clustering is needed because the value of the electrical load has a different pattern on the types of the date that exist. The training dataset for electricity load that has been normalized will be clustered using AHC. The parameter used is Complete Linkage with cluster selection based on the height of the hierarchy. In this study, the cluster selection was determined between 5 and 10, as shown in Fig. 3. From the Clustering process, label cluster data will be obtained for each row in the Date feature and the Total Demand feature, and then this dataset will be called electricity load with a label. Then the dataset is modeled based on each label using the LSTM method to create a Prediction Model for each label. Forecasting using the Prediction model is carried out to estimate the load of 48 steps every day for a week.   The cluster label for each date will be appended to Influenced factor dataset. The added label is the Target Feature of the following classification process. Because the data used are data from different fields, further analysis is needed regarding the correlation value to the electrical load data. Calculation of correlation in this study is done using the Rank Spearman method because the data distribution does not follow the normal distribution. Based on Fig. 4 and Fig. 5, it is known that the results of the merging of these features have a correlation value that is sufficient to affect the Electrical Load but not strong enough to have a direct influence, so the authors assume that these features still cannot be used as a direct predictor of Electricity Load, so further analysis or processing was needed.
For the stage of processing the forecasting, using influenced factor dataset, the classification process of the testing dataset is carried out using the classification model that will be done in advance to get a label on each date in the testing dataset. As discussed previously, the expected target group or cluster is 5 to 10 clusters. Determination of the optimal cluster is done by comparing the results of the classification of the testing data with the actual data. Furthermore, the ability of the Influenced factor as a predictor in the classification process was tested using several methods, as shown in Table 4. This table shows performance measurements from the kNN, Neural Network, and SVM methods. In this study, the classification simulation was carried out using Orange Data Mining software using the default parameters for each method. This aims to obtain the most optimal model for determining the electrical load cluster. The test results in Table 1 show that the highest number of the cluster that can be classified is cluster number 6 as ilustrated in Fig. 6. It is also shown that the best classification method is Neural Network, according to AUC and Precision performance.
The forecasting is carried out using the LSTM method, assuming that the predicted day's Influenced Factor data is accurate. Based on this data, cluster classification is carried out on the predicted day. Then forecasting is done using historical data from the previous 100 days, which have the same type of cluster. The simulation results by comparing the conventional method and the proposed method are shown in Fig. 7 and Fig. 8, and the details of the comparison of performance indicators are shown in Table 3 These data indicate that the average MaxAPE value in the testing data is better than using the conventional method containing only electrical loads data. It can be seen that the value of the difference, which is quite significant with the proposed method, is ±15%, while the MAPE value is ±7%. This difference will certainly significantly affect the efficiency of supplying the amount of material and power that needs to be prepared to accurately produce electricity regarding customer's demand. In addition, based on the test results obtained the fact that the electrical load data will be more optimal to predict by involving several influenced factors such as weather and holidays. This is evidenced by the proposed method, combining clustering as a reference medium for prediction, which is able to show more optimal prediction results with an optimization value of ± 15%.

Conclusion
This paper proposed another approach method for shortterm load forecasting based on the Deep Neural Network or widely known as the Long Short-Term Memory approach. To get optimal and accurate research, the Analytical Hierarchy Clustering method is used to cluster the load data. East-Java sub-system is used as the test system to observe the efficacy of the proposed method. From the simulation results, it is found that by using the proposed method, the load demand can be predicted more accurately than using conventional approaches. This can be a seed by the average value of MaxAPE up to 4.91%. While the MAPE and RMSE are around 2.02% and 112.08%. Further research can be carried out by implementing the method to predict solar irradiance and wind speed for PV and wind power systems. A hybrid fuzzy and deep learning method can also be considered for future research.