Sentiment Analysis of Electricity Company Service Quality Using Naïve Bayes

In facing the era of technological disruption, a large company providing electricity in Indonesia, namely PT PLN is transforming to digitize all business processes and improve the quality of customer service. PLN Mobile application was developed in December 2020, and 18 million users have downloaded it. PLN Mobile application provides various electrical services for users. There are a lot of online opinions today. Organizations need to know the public perception of their product or service, sales projections, and customer happiness. Our research will identify public opinion (positive and negative) about PLN Mobile Application using sentiment analysis by taking review data from Google Play Store. Sentiment analysis is classified using Naïve Bayes and analyzed based on the dimensions of the quality of electricity services: empathy, responsiveness, and reliability. The results of this study indicate that Naïve Bayes is quite well used for binomial labels (positive and negative) with an accuracy of 73%. Still, for service quality dimensions, the accuracy is 45%. Indonesian language datasets are quite difficult to process due to non-standard language, foreign words, mixed language variations, and abbreviations. Determination of ground truth or manual labeling requires consistency and skilled personnel to determine the context of the text data to obtain a model with optimal performance. This study informs the classification of each dimension of the quality of electricity services in Indonesia based on positive and negative sentiment data for PLN Mobile Application users. Reliability received the most negative sentiments. This can be used for PT PLN to improve the quality-of-service reliability to customers.


Introduction
Organizations in the public sector encounter numerous difficulties in a highly competitive economy. The public sector should likewise emphasize service quality and include it in framing strategic decisions. The aforementioned factors necessitate that public sector enterprises in developing nations upgrade their operations and perform more effectively [1].
In facing the era of technological disruption, a large company providing electricity in Indonesia, namely PT PLN is transforming to digitize all business processes and improve the quality of customer service. One of them is by developing applications that are closer to customers, providing the best service and experience as well as convenience for customers, such as the convenience of new electricity installations, additional power, easy token purchases, electricity payments, and submitting complaints on the one hand. The application is named PLN Mobile which was inaugurated in December 2020. For businesses to learn about general perceptions of their goods and services, sales projections, and actual customer happiness, this internet opinion is crucial data [2]. Companies can find possibilities to raise the caliber of their goods or services based on this information.
Sentiment analysis is a crucial field that lets us understand users' general attitudes about certain topics. Organizations can determine consumer happiness Yuli Astuti, Yova Ruldeviyani, Faris Salbari, Aldiansah Prayogi Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. using this information. Several authors have provided definitions of sentiment analysis. However, Liu's definition is the one that is most frequently adopted in the scientific community [3], which describes sentiment analysis as a branch of research that examines people's attitudes, evaluations, and judgments of various objects, including goods, services, organizations, people, situations, events, subjects, and qualities. Many methods for sentiment analysis have been put out in recent years. Semantic orientation and machine learning are the two fundamental techniques on which most of these strategies are built. Although both strategies produced good results, several studies in the literature have indicated that machine learning produces superior outcomes. Deep learning, a revolutionary methodology that greatly outperforms conventional approaches, has, nonetheless, gained the attention of researchers recently [4], [5].
Most users provide comments related to PT PLN services through comments on the Google Play Store. Previous research has analyzed PLN Mobile Application to assess acceptance and usefulness [6], to analyze user opinions using Twitter data [7], and the grouping user comment types [8]. Our research will identify public opinion (positive and negative) about the PLN Mobile Application using sentiment analysis by taking review data from Google Play Store. Then, we analyze public opinions that can be used to improve the quality of products or services at PT PLN. User comments through the Google Play Store became management KPIs for service improvement, business process improvement, and exploring new services that can improve the customer experience.
The formulation of sentiment classification is a classification of two types of issues, namely positive and negative [3]. The online reviews of the PLN Mobile Application were used as the training and testing data. The positive and negative categories are identified using ratings because online reviews typically include a rating score that is awarded by reviewers, such as 1-5 stars. For instance, reviews with four or five stars are seen as positive, but reviews with one to two stars are regarded as negative. Text classification issues include sentiment analysis. The word sentiment or opinion in the sentiment classification denotes a positive or negative opinion, such as great, extraordinary, dreadful, bad, terrible, etc. The algorithm used in this research is Naïve Bayes.
Research related to improving service quality using sentiment analysis has been conducted by [2]. The study used Twitter data to determine the level of customer satisfaction using the SVM and Naïve Bayes algorithms. The study by [8] grouped the types of application complaints using Naïve Bayes. Research by [9] used Naïve Bayes to determine the Quality of Service on Wifi networks. Our research chose Naïve Bayes to extract the opinion of PLN Mobile Application users to improve the quality of products or services at PT PLN.
The components of the quality of electricity services have been identified through the prior study. Five components or dimensions were found from 21 variables using exploratory factor analysis, which was used to determine service quality dimensions. In the context of Sri Lanka, the five qualities-tangibility, empathy, responsiveness, reliability, and assuranceare categorized as dimensions of the quality of electrical services [1]. Research by [10] mentions three determinants of service quality in Gambian power companies: assurance, responsiveness, and empathy.
Research by [11] uses five service quality dimensions, reliability, responsiveness, assurance, empathy, and tangible, to determine customer satisfaction in Nigerian power companies.
Our research takes three factors that will be examined according to the service context at PLN Mobile Application to analyze sentiments like empathy, responsiveness, and reliability. From this research, PT PLN develops and implements a strong strategy and business with professional service delivery focused on consumerism.

Research Methods
This study uses a research methodology, as shown in Figure 1. power company using exploratory factor analysis (EFA), and research of [11] in the Nigerian power company, three dimensions of electricity service quality are taken as follows.
Empathy is the attention given to customers during or after service delivery [10]. Customers are satisfied when the business completely understands their expectations and the customer service offered can address their issues. Empathy is a process that businesses can use to improve communication with their clients to ensure and offer appropriate solutions to unmet consumer wants from the services supplied as a strategy to promote customer happiness.
The company's responsiveness describes employees' spirit in providing services as quickly as possible whenever customers need them [10]. Employees have high enthusiasm for providing services to customers in a timely and hassle-free manner. Employees respond to customer complaints when needed and provide solutions to any customer problems as quickly as possible so that customers are satisfied with the services provided.
Reliability is to provide consistent service as expected by customers [11]. Reliability results in customer retention when a service provider can supply services accurately and continuously by client expectations.
Overall, service quality is significantly impacted by reliability. The ability of the business to overcome any potential obstacles to expected service and maintain a high error-free rate serves as the foundation for reliability in service quality for customer satisfaction.
Service quality is described from various perspectives by several researchers. According to [11], service quality is the connection between client perceptions and expectations of certain service offerings. According to [12], consumer attitudes, perceptions, expectations, and satisfaction with services are all influenced by service quality. According to [13], [14], the customer's overall assessment of service excellence is service quality. The user's overall perception of the company and its services' relative dominance is also considered about service quality.
A study conducted by [15] in an Indian electric utility firm specifies seven items, including power quality, supply, mode of utility bill payment, complaint processing, new connectivity, security, inconsistent voltage, and time management for bill payments. The quality of services in the electrical sector needs to be improved, and suitable policy changes must be made. They place more emphasis on the crucial significance that offering excellent customer service has in raising the caliber of their services.

Data Collection
The internet is one of the sources of data in the era of Big Data development that can be acquired cheaply and used for a variety of reasons. Data collection from this research is based on one of the Google Play Store reviews of the PLN Mobile Application in September 2021 via the internet. The entity obtained from the review is in the form of the username used by the customer for the review, the date of the review, the star rating given, and comments from the reviewer. One of the tools used for data retrieval is Webharvy [16], [17] which is then used to retrieve PLN Mobile application review data on the Google Play Store.
When retrieving data using Webharvy, JavaScript is required to display all full reviews from users who wrote longer than the limit to display text. The Inspect button element informs the button class code that can be used to generate the JavaScript exemplified as a code.
[a] var links = document.getElementsByClassName('LkLjZd ScJHi OzU4dc'); Rating data is needed to make it easier to categorize user comments into positive and negative. Then to be able to retrieve rating data, it is necessary to capture HTML and then apply an Aria Regular Expression (RegEx) label with the sentence before the star number, as shown in Figure 2.

Data Pre-Processing
We did unnecessary manual deletion to remove irrelevant words and make the words uniform, and we got 1500 data sets. Then to break the sentence into tokens or per word, it is necessary to use the Tokenization method [18]. Data cleaning is required to remove unnecessary punctuation and delete Stop Words after the Tokenization process. Stemming is used for pre-processing data to obtain information by tracing the words that are affixed back to the root. In Indonesia, the Stemming method has been researched and proven with high accuracy of results [19] to get word compaction into basic words to make it easier to conduct the analysis, and it is necessary to use the Stemming method. After getting the basic words from the Stemming results, we try to do manual labeling in a positive or negative direction which will later try to be weighted and classified. Labeling is divided into two classes "Positive" and "Negative". The class "Positive" denotes a positive user review, whereas the word "Negative" denotes a negative user review. Search engines typically evaluate or rate the degree of connection between documents and user queries using different TF-IDF weighting methods [18] so that after the text data has been successfully transformed into uniform and basic words, they start to weigh as much importance as words using the TF-IDF method. The Naïve Bayes technique will be used to do sentiment analysis on user comments or opinions to categorize the judgments they make about this series.

Data Training and Testing
Training data and testing data are split into two sets with a ratio of 80%:20%. The training data is used to generate a Naïve Bayes classification model. Then the model will be applied to the testing data. The splitting of the dataset into training and testing uses stratified sampling to ensure that each subset has a proportional and balanced distribution of data.

TF-IDF Weighting
Utilizing the TF-IDF approach, the review text data was weighted. A shorthand for the terms "term frequency" and "inverse document frequency," respectively, is TF-IDF. The first topic to be covered deals with the term's frequency; TF is used to calculate how frequently a term appears in a text [18]. If we have a document named "Doc1" with 1500 words and exactly 10 instances of the word "Good," we can use this as an example. Since a document might be somewhere between extremely short and very long overall, any term can appear more frequently in longer texts than in shorter ones. To solve this issue, the frequency of any phrase is determined by dividing its occurrence in the document by the total number of terms therein. Therefore, formula 1 may be used to determine the frequency of the word "Good" in the document "Doc1". TF = 10/1500 = 0,006 Second, we will talk about document frequency inversion. When the document term frequency is determined, the algorithm evaluates every keyword similarly, regardless of whether it is a stop word like "from," which means it may be inaccurate [18]. Varying keywords have different weights. The IDF exists for circumstances like this, where the word "from" has no use or has very little value while appearing 2000 times in the document. Those that frequently appear in documents are given less weight by document frequency inversion, whereas words that do not are given greater weight. For instance, if we have 10 documents and 5 of them contain the word "application," we may use Formula 2 to determine the document frequency inversion.
It is clear from TF and IDF that the more frequently words occur in a document, the higher the term frequency, and the less frequent words occur in a document, the higher the degree of importance (IDF) for the keywords searched for in certain documents.
TF-IDF will generate numeric data in the form of vectors that represent each text data point into numbers. In this research, examples of text data vectorization results can be seen in Table 1. Each word has its weight (TF-IDF calculation) for each document or in this case, is a review.

Classification Using Naïve Bayes
Information extraction about product reviews ran into several issues. The issue might be viewed as an uncertainty issue. The Naïve Bayes classifier is one of the machine learning techniques that can be utilized to solve these issues. Naïve Bayes is an approach to uncertainty reasoning that makes use of an inferencebased probabilistic model. The Naïve assumptionthat characteristics are expected to be conditionally independent given a class-is enforced by Naïve Bayes. However, it has been demonstrated that Naïve Bayes performs well for many classification issues. It is also well known that Naive Bayes performs exceptionally well with straightforward calculations [20].
It is easy to assume that sentiment words and phrases might be employed for sentiment classification in an unsupervised manner since sentiment words are frequently the dominant element in the categorization of sentiment. The unsupervised learning strategy is  [21].
A dictionary of sentiment words and phrases with related orientations is used in the lexicon-based method, another unsupervised approach, to combine intensification and negation to calculate a sentiment score for each document. This approach was first applied to the sentiment classification of sentences and aspects [22].
Aspects and sentiments are two factors that are classified. We may state that both factors have an impact on how words are used in sentences by characterizing the system in terms of a generative model. As a result, each variable's value affects the word probability distribution.  Implementation of the Naïve Bayes algorithm for data classification in this research is using Rapidminer. After the data is divided into two sets and weighted in the form of a numerical vector, the Naïve Bayes classification algorithm will be implemented in the training data and produce a classification model.
After that, the performance of the model will be validated using the cross-validation method. Crossvalidation applies a classification model to the training data itself to validate whether the model can predict the suitable class for each data point.
The evaluation of the models was done using a variety of indices, including the confusion matrix, accuracy, precision, and recall. The accuracy of actual and predicted class is represented by the confusion matrix by comparing the predicted class and actual class. The accuracy shows the percentage of actual events in all examined data. Precision shows the comparation between true event and total true event predicted by classifier. Recall shows the comparation between true predictive event and total number positive event. The formulas for this evaluation of the models are provided in Formula 3, 4 and 5 [23]. (3) After the classification model gets optimal performance results, the model can be applied to the testing data to get the class prediction results made by the model on the labels in the testing data. Then the classification results are validated using crossvalidation to see their performance.

Analysis
After the classification, an analysis of the service quality of 1500 PLN Mobile Application users' opinion data was carried out. Each opinion is grouped into the dimensions of empathy, responsiveness, and reliability. In addition to being grouped into individual dimensions, user comments are also grouped into positive and negative. Further negative comments will be made to PT PLN for improvements to its services.

Results and Discussions
The experiment was conducted to find out whether the Naïve Bayes classification method is appropriate for the user review dataset against the PLN Mobile Application on the Google Play Store. In this case study, the experiment was carried out with a multilabel classification approach. Each review data is given two independent labels, namely sentiment (negative or positive) and service quality dimensions (empathy, responsiveness, or reliability). Table 2 informs the amount of review data taken from the Google Play Store and has been grouped into 3 dimensions of electrical service quality. The performance results of training and testing data can be seen in Table 3. The results showed relatively good testing performance on binomial labels (sentiment labels) but relatively poor on multinomial  The experiment's results also showed that the model did not get overfitting if the training data's accuracy was much better than the accuracy of the testing data.
In this case study, overfitting was avoided with balanced data, a model that was no more complex than the number of datasets, and cross-validation of the model against the training data. Tables 4 and 5 show the confusion matrix. The findings show that the resulting model is less able to recognize data with empathy labels. This is most likely due to unbalanced data and inconsistent ground truth. Datasets with Indonesian have their challenges, nonstandard words, the use of foreign words, mixed language variations, and the frequent use of abbreviations for a word or phrase. The libraries in programming languages available to pre-process Indonesian datasets are also very limited. This requires more effort at the pre-processing stage of text data, such as creating rule-based cleaning by hardcoding.  Examples of customer comments on each dimension with positive sentiments can be seen in Table 7.  Table 8 shows examples of negative sentiment related to each dimension of service quality.  Figure 4 shows the words that often arise from the sentiment analysis of PLN Mobile Application users, namely 'aplikasi, sangat, listrik, layan, bantu, cepat, mobile, baik, mudah, tugas, lapor, baru, bayar, kasih, langgan, respon, token, gangguan'.
Things that companies can improve in response to these customer comments are simplifying the process of new installation services, synchronizing the process of repairing power failures in the field with the status of work in the application, and repairing the power grid that often experiences outages.
Yuli Astuti, Yova Ruldeviyani, Faris Salbari, Aldiansah Prayogi Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) Vol. Previous research by [24] analyzed user opinions using Twitter data and ranked power outage complaints in Jakarta by subdistrict. Research by [25] suggests improvements to the user interface of the PLN Mobile Application to make it easier for customers to understand. Research by [7] using Twitter data suggests that it is necessary to improve the customer experience and ease of service interruptions. A study by [6] distributing questionnaires to users and showed that 84% of respondents agreed that PLN Mobile Application has benefits and is accepted by the community.

Conclusion
The results of this study can be drawn from several conclusions. First, the Naïve Bayes algorithm was quite well used for this case study with an accuracy value of 73% for binomial labels, in this study using positive and negative sentiments from PLN Mobile Application reviews from the Google Play Store. For polynomial / multinomial labels such as the dimensions of service quality, namely empathy, reliability, and responsiveness, the accuracy is only 45%. Research is also needed on data cleaning processes such as stemming because it could be change the meaning or context of text data.
Finally, this study informs the classification of each dimension of the quality of electricity services in Indonesia based on positive and negative sentiment data of PLN Mobile Application users. Reliability gets the most negative sentiment. This shows that the reliability services of PT PLN are the most complained about by customers compared to other dimensions. This can be used for PT PLN to improve the quality of service to customers, such as simplifying the process of new installation services, synchronizing the process of repairing power failures in the field with the status of work in the application, and repairing the power grid that often experiences outages.