Indonesian Crude Oil Price (ICP) Prediction Using Multiple Linear Regression Algorithm

Crude oil prices play a significant role in the global economy, therefore accurate prediction of oil prices is very important. Therefore, a forecasting model is needed to predict Crude Oil Prices. The purpose of this study is to forecast the price of crude oil from Indonesia (ICP). The data source is from a website published by the Ministry of Energy and Mineral Resources (ESDM), namely monthly crude oil price data specifically for six main types of crude oil: SLC, Attaka, Duri, Belida, Banyu and SC. The data used is data for a period of 5 years (2018 – 2022). The data available is in the form of time series data. Dated Brent combined with the Alpha factor for each month and year is a reference in determining the ICP price. Forecasting Indonesian crude oil prices in the future is based on the historical oil price of the previous period. The Data Mining algorithm used for forecasting is Multiple Linear Regression. The dataset processed using training data is 80%, and testing data is 20%. The model produced, on average, has a good level of accuracy in calculating MAPE where for SLC = 9%, Attaka = 45%, Duri = 126%, Belida = 33%, Banyu = 150% and SC = 50%. Based on the MAPE calculation value, the Linear Regression Equation to predict Indonesian Crude Oil Prices (ICP) shows that the model produced by SLC crude oil is very good. Attaka, Belida and SC crude oil yielded fair yields and Duri and Banyu crude oil yielded poor yields.


Introduction
In the world economy, crude oil plays an important role, therefore accurate prediction of oil prices is very important [1], [2]. Crude oil price fluctuations and the accurate predictions are a concern for many industry practitioners, researchers and policy makers [3], [4]. One of the plantation products that makes up Indonesia's primary export is palm oil, often known as crude palm oil. In comparison to OPEC nations, Indonesia tends to import oil or has significant local oil demands. Indonesian crude oil is not very competitive or tends to be imported by other nations [5], [6].The market prospects for palm oil have increased significantly from year to year. Price is an important factor in determining the selling value of the resulting product. Prices affect producer profits. Price is also decisive for consumers to buy, so it is very important to monitor and predict the price of crude oil or Indonesian Crude Oil Price (ICP). Crude oil is one of the main resources in the energy sector, and the efficiency of this industry partly depends on the price of this resource. Industries, governments, and people can all benefit greatly from these pricing estimates [7]. Therefore, it is important to buy it at minimum price, which requires tools for market price forecasting. As well it can be useful in Strategic Trade Theory that can be improved by predictions [8], [9]. Forecasting crude oil prices is very important to help businesses and governments make decisions about the energy market and can reduce the impact of price fluctuations. In order to accomplish this, we require a program that can assist in estimating the price of palm oil, particularly for the domestic market.
Several studies have been conducted to predict the Indonesian Crude Oil Price (ICP) price pattern. Muhammad Hussein (2020) indicated that oil prices would be influenced by two factors, which are demand and income [10] [11]. Multiple linear regression was used in Ahmad Fitri Boy's research (2020) to forecast the price of Indonesian Crude Oil (ICP) in the domestic market [12]. This research was conducted 1) to identify the characteristics of crude oil price data in Indonesia, 2) to produce forecasting models with data mining algorithms that are suitable for volatile crude oil prices, 3) the research results can be used in a decision making for stakeholders and 4) add knowledge of machine learning in the field of data mining science.
The urgency of this research is to predict the ICP price for the future period based on the ICP price in the previous period, where the available data is in the form of times series data. Dated Brent combined with the Alpha factor is a reference in determining the ICP price. Forecasting Indonesian crude oil prices in the future is based on the historical oil price of the previous period with data mining algorithms. The Data Mining Algorithm used for forecasting is Multiple Linear Regression. Identification of relationships and their impacts on object values is the process of regression. Finding a function that best represents the data by reducing the error or difference between the predicted and actual values is the goal of regression analysis [13]. Multiple Linear Regression is a method to study the relationship between variables in the data forecasting process. Therefore, the Multiple Linear Regression method is the right algorithm for forecasting problems. Dated-Brent combined with the Alpha factor is a reference in determining the ICP price.

Research Stage
In data mining, information is stored electronically and is searched automatically using computer algorithms. Data mining is the process of solving problems by analyzing data that is already available in databases. A customer database with customer profiles could help solve this problem [9], [14]. The research method is a sequential process in research activities. In this study, several research steps will be carried out, as depicted in Figure 1. Analyzing the needs that consist of : a). Define goals and plan, b). Collecting data. The Ministry of Energy and Mineral Resources website contains five years' worth of Indonesian crude oil price data, which serves as the data source (2018-2022), c). Reviewing data Developing the Model that include a). Data Preparation. b). Model Testing. Data preparation is a very important step for predictive model design. Data preprocessing is a process of cleaning, transforming, and reducing data before analysis. Because real-time database data is frequently incomplete and inconsistent, preprocessed data is frequently used. This can result in inaccurate data mining results. As a result, the data preprocessing steps listed below must be completed in order to improve the quality of the data that will be analyzed: 1). Data Cleaning. The data collected may have many parts that do not fit, and some parts are missing, so a data cleaning process is required. 2). Data Transformation. Data transformation is used to transform data into a suitable form. 3). Data reduction. Data analysis using large data sets is difficult, so data reduction techniques are needed to improve storage efficiency and reduce data storage and analysis costs. The degree of deviation or error between the predicted and actual data is known as predictive accuracy [15].
In using the resulting model, validation is a very important step in modelling to see how reliable the model is for decision-making. Errors in making predictions are caused not only by factors causing errors but also by the inability of the prediction model to recognize other factors in the data set that affect the variance in predictions. Some ways to calculate error magnitude are MSE, RMSE and MAPE. MSE is the average of the squared difference between the predicted and observed values, RMSE is the root of MSE, and MAPE is the average of the absolute difference between the predicted and actual values [16]. Deployment, evaluate the quality and effectiveness of one or more models submitted in the modelling phase before placing them for use in the field.

Data Analysis Method
In this study, the first dataset test performed was the preprocessing process. Preprocessing data involves transforming raw data into a form that is more easily understood. This process is important because raw data often do not have a regular format. Furthermore, data mining also cannot process raw data, so this process is important to facilitate the next cycle, which is data analysis.
This process generates a new dataset. Then the modelling process is carried out using the Linear Regression algorithm, one of the prediction algorithms in the field of data mining. The next stage is the performance test process to see the accuracy of the predictions (Validation Stage). The last stage is to test the prediction of the testing data. The proposed design is shown in Figure 2.

Linier Regression
The linear regression model is a reliable tool for analyzing real-world data. There are many benefits to using linear regression, such as the fact that the linear regression model in training is faster than many predictive models [16], [17], [19]. Linear regression is a statistical method that is used to determine the strength of the relationship between a dependent variable and one or more independent variables. It can also be used to determine which independent variables are not related to the dependent variable, and which independent variables contain information that is redundant with information about the dependent variable. Linear regression models are easy to use and can be implemented quickly using memory resources [18], [19]. The statistical method known as regression analysis is used to examine the relationship that exists between two or more variables. Dependent (response) variables are typically measured on a scale, and one or more predictor variables are used to help explain or predict changes in the dependent variable(s). The relationship between these variables is modelled as a function (equation), for example, a linear function. The purpose of the regression model is to obtain the estimated parameters (coefficients) of the regression modelIn its simplest form, the independent variable (X) and dependent variable (Y) have the following equations: Here, b denotes the direction or beta coefficient, while a represents the intercept. There is only one line equation function Y = a + bX that can be created from two points with dissimilar coordinates, namely (X1, Y1) and (X2, Y2).
Multiple linear regression analysis is an analysis that includes multiple independent variables. Determine whether there is a significant relationship between two or more independent variables (X1, X2, X3, ..., Xk) and the dependent variable (Y) using the multiple linear regression technique. The following is an example of a population multiple linear regression model: The magnitude of the model prediction error is the error that occurs between the predicted data and the actual data. The error is represented using the original Mean Squared Error (MSE), which is the mean of the squared difference between the predicted and observed values; the original mean square error (RMSE) is the original of the MSE. RMSE is a good measure of accuracy, but just to compare the error different model predictions for variables certain and not between variables, because depending on the scale [13], [19], [20].
The formulas for MSE and RMSE are as follows: Information: Y'= Predicted value, Y = Actual value, n = Number of data Mean Absolute Percentage Error (MAPE) is obtained by adding as a whole and subtracting the value of the actual data with the forecasting data. Then, this value is divided by the actual data (the absolute value is required), multiplied by 100, and divided by the number of existing data. The absolute value in this term is the value that remains positive even if it is negative.
The Mean Absolute Percentage Error (MAPE) formula is as follows: Information: At = Actual demand to t, Ft = forecasting result to t N = the amount of forecasting data, The MAPE formula's absolute symbol denotes that the computation result's negative value will remain positive.

Dated Brent and Alpha
The provides the basic advantage that it is more independent; it is more difficult to be influenced or manipulated by certain parties because many countries use it as the benchmark price for Brent. The ICP formula consists of Dated Brent plus Alpha, calculated by considering the suitability of crude oil quality, developments in international crude oil prices and national energy security. Alpha will be determined monthly by the Minister of Energy and Mineral Resources.

Data Set
The data was obtained through a website published by the Ministry of Energy and Mineral Resources (ESDM). The data source used in this study is monthly crude oil price data for 55 crude oil over four years (2018)(2019)(2020)(2021)(2022)

Alpha Prediction
Results of Linear Regression for Alpha can be seen in table 4.   The graph in Figure 3 shows the Alpha value of one of the Main Crude Oils, which is Attaka, in 2022, where the actual value starts in January-September and the predicted value from October-December 2022.

ICP Price Prediction
The Indonesian Crude Price, especially the six main types of crude oil, is predicted by adding up the prediction results of Date-Brent and Alpha. For ICP predictions in October-December 2022, the prediction results are shown in Table 6.

Performance Evaluation/Validation
The Root Mean Square Error (RMSE) is a measure of accuracy that shows how well individual pairs of forecast and observation values match up on average. In this measure, the prediction model is said to be the best if the RMSE value is 0 (zero). Table 7 displays the analysis of the MAPE value.  The result of the MAPE value in Table 8 is above 10% because the source of the data obtained, namely the decree of the Minister of Energy and Mineral Resources regarding the determination of ICP, contains 30% incomplete data. In the preprocessing stage, it is done by calculating the average value each year to complete the incomplete data.

Conclusion
This study forecasts Indonesian Crude Oil Prices (ICP) using time series data and Multiple Linear Regressions.
The results show that the intercept value, variable coefficient 1 and variable coefficient 2 so that the ICP prediction model is obtained for SLC, Attaka, Belida, Duri, Banyu and SC crude oil types. The model produced, on average, has a good level of accuracy in calculating MAPE where for SLC = 9%, Attaka = 45%, Duri = 126%, Belida = 33%, Banyu = 150% and SC = 50%. Based on the MAPE calculation value, the Linear Regression Equation to predict Indonesian Crude Oil Prices (ICP) shows that the model produced by SLC crude oil is very good. Attaka, Belida and SC crude oil yielded fair yields and Duri and Banyu crude oil yielded poor yields.