LL-KNN ACW-NB: Local Learning K-Nearest Neighbor in Absolute Correlation Weighted Naïve Bayes for Numerical Data Classification
LL-KNN ACW-NB: Local Learning K-Nearest Neighbor in Absolute Correlation Weighted Naïve Bayes untuk Klasifikasi Data Numerik
Abstract
Naïve Bayes (NB) algorithm is still in the top ten of the Data Mining algorithms because of it is simplicity, efficiency, and performance. To handle classification on numerical data, the Gaussian distribution and kernel approach can be applied to NB (GNB and KNB). However, in the process of NB classifying, attributes are considered independent, even though the assumption is not always right in many cases. Absolute Correlation Coefficient can determine correlations between attributes and work on numerical attributes, so that it can be applied for attribute weighting to GNB (ACW-NB). Furthermore, because performance of NB does not increase in large datasets, so ACW-NB can be a classifier in the local learning model, where other classification methods, such as K-Nearest Neighbor (K-NN) which are very well known in local learning can be used to obtain sub-dataset in the ACW-NB training. To reduction of noise/bias, then missing value replacement and data normalization can also be applied. This proposed method is termed "LL-KNN ACW-NB (Local Learning K-Nearest Neighbor in Absolute Correlation Weighted Naïve Bayes)," with the objective to improve the performance of NB (GNB and KNB) in handling classification on numerical data. The results of this study indicate that the LL-KNN ACW-NB is able to improve the performance of NB, with an average accuracy of 91,48%, 1,92% better than GNB and 2,86% better than KNB.
Downloads
References
L. Jiang, C. Li, S. Wang, and L. Zhang, “Deep feature weighting for naive Bayes and its application to text classification,” Eng. Appl. Artif. Intell., vol. 52, pp. 26–39, 2016.
R. T. Asmono, R. S. Wahono, and A. Syukur, “Absolute Correlation Weighted Naïve Bayes for Software Defect Prediction,” J. Softw. Eng., vol. 1, no. 1, pp. 38–45, 2015.
J. Wu and Z. Cai, “Attribute Weighting via Differential Evolution Algorithm for Attribute Weighted Naive Bayes ( WNB ),” J. Comput. Inf. Syst., vol. 5, no. 5, pp. 1672–1679, 2011.
J. Lin and J. Yu, “Weighted Naive Bayes Classification Algorithm Based on Particle Swarm Optimization,” in 2011 IEEE 3rd International Conference on Communication Software and Networks, 2011, pp. 444–447.
S. Taheri, J. Yearwood, M. Mammadov, and S. Seifollahi, “Attribute weighted Naive Bayes classifier using a local optimization,” Neural Comput. Appl., vol. 24, no. 5, pp. 995–1002, 2014.
A. Nurnberger, C. Borgelt, and A. Klose, “Naive Bayes Classifiers Using Neuro-Fuzzy Learning ’,” in ICONIP’99. ANZIIS’99 & ANNES’99 & ACNN’99. 6th International Conference on Neural Information Processing. Proceedings (Cat. No.99EX378), 1999, pp. 154–159.
P. Langley and S. Sage, “Induction of Selective Bayesian Classifiers,” in Proceedings 10th Conference Uncertainty in Artificial Intelligence, 1994, pp. 339–406.
N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian Network Classifiers,” Mach. Learn., vol. 29, pp. 131–163, 1997.
R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artif. Intell., vol. 97, no. 1–2, pp. 273–324, 1997.
H. Zhang and C. X. Ling, “An Improved Learning Algorithm for Augmented Naive Bayes,” Adv. Knowl. Discov. Data Min., pp. 581–586, 2001.
C. A. Ratanamahatana and D. Gunopulos, “Scaling up the Naive Bayesian Classifier: Using Decision Trees for Feature Selection,” in Proceedings Workshop Data Cleaning and Preprocessing (DCAP ’02), 2002.
H. Zhang and S. Sheng, “Learning Weighted Naive Bayes with Accurate Ranking,” in Fourth IEEE International Conference on Data Mining (ICDM’04), 2004, pp. 567–570.
L. Jiang, H. Zhang, Z. Cai, and J. Su, “Evolutional Naive Bayes,” in Proceedings First International Symposium on Intelligent Computation and Its Applications (ISICA ’05), 2005, pp. 344–350.
G. I. Webb, J. R. Boughton, and Z. Wang, “Not So Naive Bayes: Aggregating One-Dependence Estimators,” Mach. Learn., vol. 58, no. 1, pp. 5–24, 2005.
M. Hall, “A Decision Tree-Based Attribute Weighting Fiter for Naive Bayes,” in International Conference on Innovative Techniques and Applications of Artificial Intelligence, 2007, pp. 59–70.
W. Deng, G. Wang, and Y. Wang, “Weighted Naive Bayes Classification Algorithm Based on Rough Set,” Comput. Sci., vol. 34, pp. 204–206, 2007.
H. Zhang, “Using Instance Cloning to Improve Naive Bayes for Ranking,” Int. J. Pattern Recognit. Artif. Intell., vol. 22, no. 6, pp. 1121–1140, 2008.
L. Jiang, H. Zhang, and Z. Cai, “A Novel Bayes Model: Hidden Naive Bayes,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 10, pp. 1361–1371, 2009.
J. Wu, S. Pan, Z. Cai, X. Zhu, P. Zhang, and C. Zhang, “Self-adaptive attribute weighting for Naive Bayes classification,” Expert Syst. Appl., vol. 42, no. 3, pp. 1487–1502, 2015.
B. A. Muktamar, N. A. Setiawan, and T. B. Adji, “Pembobotan Korelasi Pada Naïve Bayes Classifier,” in Seminar Nasional Teknologi Informasi dan Multimedia 2015, 2015, no. 2, pp. 43–47.
L. Zhang, L. Jiang, C. Li, and G. Kong, “Two Feature Weighting Approaches for Naive Bayes Text Classifiers,” Knowledge-Based Syst., vol. 100, pp. 137–144, 2016.
J. Song, K. T. Kim, B. Lee, S. Kim, and H. Y. Youn, “A novel classification approach based on Naïve Bayes for Twitter sentiment analysis,” KSII Trans. Internet Inf. Syst., vol. 11, no. 6, pp. 2996–3011, 2017.
J. Zhu, J. Xu, C. Zhang, and Y. Gao, “Marine Fishing Ground Prediction Based on Bayesian Decision Tree Model,” in Proceedings of the 2017 International Conference on Management Engineering, Software Engineering and Service Sciences, 2017, pp. 316–320.
N. Sun, B. Sun, J. D. Lin, and M. Y. Wu, “Lossless Pruned Naive Bayes for Big Data Classifications,” Big Data Res., vol. 14, pp. 27–36, 2018.
J. Wu, “A Generalized Tree Augmented Naive Bayes Link Prediction Model,” J. Comput. Sci., vol. 27, pp. 206–217, 2018.
L. Yu, L. Jiang, W. Dianhong, and L. Zhang, “Toward naive Bayes with attribute value weighting,” Neural Comput. Appl., vol. 5, pp. 1–15, 2018.
T. R. Golub et al., “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science (80-. )., vol. 286, no. 5439, pp. 531–537, 1999.
T. S. Furey, N. Cristianini, N. Duffy, D. W. Bednarski, M. Schummer, and D. Haussler, “Support vector machine classification and validation of cancer tissue samples using microarray expression data,” Bioinformatics, vol. 16, no. 10, pp. 906–914, 2000.
R. Kohavi, “Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid,” in Proceedings Second International Conference Knowledge Discovery and Data Mining (KDD ’96), 1996, pp. 202–207.
Z. Xie, W. Hsu, Z. Liu, and M. L. Lee, “SNNB: A Selective Neighborhood Based Naive Bayes for Lazy Learning,” in Proceedings Sixth Pacific-Asia Conference Knowledge Discovery and Data Mining (KDD ’02), 2002, pp. 104–114.
E. Frank, M. Hall, and B. Pfahringer, “Locally Weighted Naive Bayes,” in Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence, 2003, pp. 249–256.
Z. Zheng and G. I. Webb, “Lazy Learning of Bayesian Rules,” Mach. Learn., vol. 41, no. 1, pp. 53–84, 2000.
Y. F. Safri, R. Arifudin, and M. A. Muslim, “K-Nearest Neighbor and Naive Bayes Classifier Algorithm in Determining The Classification of Healthy Card Indonesia Giving to The Poor,” Sci. J. Informatics, vol. 5, no. 1, pp. 9–18, 2018.
C. Bielza and P. Larrañaga, “Discrete Bayesian Network Classifiers: A Survey,” ACM Comput. Surv., vol. 47, no. 1, pp. 5:1-5:43, 2014.
S. Zhang, Z. Jin, and X. Zhu, “Missing data imputation by utilizing information within incomplete instances,” J. Syst. Softw., vol. 84, no. 3, pp. 452–459, 2011.
P. H. Abreu, M. S. Santos, M. H. Abreu, B. Andrade, and D. C. Silva, “Predicting Breast Cancer Recurrence Using Machine Learning Tehniques: A Systematic Review,” ACM Comput. Surv., vol. 49, no. 3, pp. 52:1-52:40, 2016.
M. M. Suarez-Alvarez, D.-T. Pham, M. Y. Prostov, and Y. I. Prostov, “Statistical approach to normalization of feature vectors and clustering of mixed datasets,” in Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2012, vol. 468, no. 2145, pp. 2630–2651.
R. J. Freund and W. J. Wilson, Statistical Methods (2nd ed.). Academic Press, 2003.
I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene Selection for Cancer Classification using Support Vector Machine,” Mach. Learn., vol. 46, no. 1–3, pp. 389–422, 2002.
P. Pavlidis, J. Weston, J. Cai, and W. N. Grundy, “Gene functional classification from heterogeneous data,” in Proceedings of the fifth annual international conference on Computational biology - RECOMB ’01, 2001, no. 212, pp. 1–11.
Copyright (c) 2020 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;