LL-KNN ACW-NB: Local Learning K-Nearest Neighbor in Absolute Correlation Weighted Naïve Bayes for Numerical Data Classification

LL-KNN ACW-NB: Local Learning K-Nearest Neighbor in Absolute Correlation Weighted Naïve Bayes untuk Klasifikasi Data Numerik

  • Azminuddin I. S. Azis Universitas Ichsan Gorontalo
  • Budy Santoso Universitas Ichsan Gorontalo
  • Serwin Universitas Ichsan Gorontalo
Keywords: gaussian naive bayes, k-nearest neighbor, absolute correlation coefficient, local learning, attribute weighting

Abstract

Naïve Bayes (NB) algorithm is still in the top ten of the Data Mining algorithms because of it is simplicity, efficiency, and performance. To handle classification on numerical data, the Gaussian distribution and kernel approach can be applied to NB (GNB and KNB). However, in the process of NB classifying, attributes are considered independent, even though the assumption is not always right in many cases. Absolute Correlation Coefficient can determine correlations between attributes and work on numerical attributes, so that it can be applied for attribute weighting to GNB (ACW-NB). Furthermore, because performance of NB does not increase in large datasets, so ACW-NB can be a classifier in the local learning model, where other classification methods, such as K-Nearest Neighbor (K-NN) which are very well known in local learning can be used to obtain sub-dataset in the ACW-NB training. To reduction of noise/bias, then missing value replacement and data normalization can also be applied. This proposed method is termed "LL-KNN ACW-NB (Local Learning K-Nearest Neighbor in Absolute Correlation Weighted Naïve Bayes)," with the objective to improve the performance of NB (GNB and KNB) in handling classification on numerical data. The results of this study indicate that the LL-KNN ACW-NB is able to improve the performance of NB, with an average accuracy of 91,48%, 1,92% better than GNB and 2,86% better than KNB.

 

Downloads

Download data is not yet available.

References

L. Jiang, C. Li, S. Wang, and L. Zhang, “Deep feature weighting for naive Bayes and its application to text classification,” Eng. Appl. Artif. Intell., vol. 52, pp. 26–39, 2016.

R. T. Asmono, R. S. Wahono, and A. Syukur, “Absolute Correlation Weighted Naïve Bayes for Software Defect Prediction,” J. Softw. Eng., vol. 1, no. 1, pp. 38–45, 2015.

J. Wu and Z. Cai, “Attribute Weighting via Differential Evolution Algorithm for Attribute Weighted Naive Bayes ( WNB ),” J. Comput. Inf. Syst., vol. 5, no. 5, pp. 1672–1679, 2011.

J. Lin and J. Yu, “Weighted Naive Bayes Classification Algorithm Based on Particle Swarm Optimization,” in 2011 IEEE 3rd International Conference on Communication Software and Networks, 2011, pp. 444–447.

S. Taheri, J. Yearwood, M. Mammadov, and S. Seifollahi, “Attribute weighted Naive Bayes classifier using a local optimization,” Neural Comput. Appl., vol. 24, no. 5, pp. 995–1002, 2014.

A. Nurnberger, C. Borgelt, and A. Klose, “Naive Bayes Classifiers Using Neuro-Fuzzy Learning ’,” in ICONIP’99. ANZIIS’99 & ANNES’99 & ACNN’99. 6th International Conference on Neural Information Processing. Proceedings (Cat. No.99EX378), 1999, pp. 154–159.

P. Langley and S. Sage, “Induction of Selective Bayesian Classifiers,” in Proceedings 10th Conference Uncertainty in Artificial Intelligence, 1994, pp. 339–406.

N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian Network Classifiers,” Mach. Learn., vol. 29, pp. 131–163, 1997.

R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artif. Intell., vol. 97, no. 1–2, pp. 273–324, 1997.

H. Zhang and C. X. Ling, “An Improved Learning Algorithm for Augmented Naive Bayes,” Adv. Knowl. Discov. Data Min., pp. 581–586, 2001.

C. A. Ratanamahatana and D. Gunopulos, “Scaling up the Naive Bayesian Classifier: Using Decision Trees for Feature Selection,” in Proceedings Workshop Data Cleaning and Preprocessing (DCAP ’02), 2002.

H. Zhang and S. Sheng, “Learning Weighted Naive Bayes with Accurate Ranking,” in Fourth IEEE International Conference on Data Mining (ICDM’04), 2004, pp. 567–570.

L. Jiang, H. Zhang, Z. Cai, and J. Su, “Evolutional Naive Bayes,” in Proceedings First International Symposium on Intelligent Computation and Its Applications (ISICA ’05), 2005, pp. 344–350.

G. I. Webb, J. R. Boughton, and Z. Wang, “Not So Naive Bayes: Aggregating One-Dependence Estimators,” Mach. Learn., vol. 58, no. 1, pp. 5–24, 2005.

M. Hall, “A Decision Tree-Based Attribute Weighting Fiter for Naive Bayes,” in International Conference on Innovative Techniques and Applications of Artificial Intelligence, 2007, pp. 59–70.

W. Deng, G. Wang, and Y. Wang, “Weighted Naive Bayes Classification Algorithm Based on Rough Set,” Comput. Sci., vol. 34, pp. 204–206, 2007.

H. Zhang, “Using Instance Cloning to Improve Naive Bayes for Ranking,” Int. J. Pattern Recognit. Artif. Intell., vol. 22, no. 6, pp. 1121–1140, 2008.

L. Jiang, H. Zhang, and Z. Cai, “A Novel Bayes Model: Hidden Naive Bayes,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 10, pp. 1361–1371, 2009.

J. Wu, S. Pan, Z. Cai, X. Zhu, P. Zhang, and C. Zhang, “Self-adaptive attribute weighting for Naive Bayes classification,” Expert Syst. Appl., vol. 42, no. 3, pp. 1487–1502, 2015.

B. A. Muktamar, N. A. Setiawan, and T. B. Adji, “Pembobotan Korelasi Pada Naïve Bayes Classifier,” in Seminar Nasional Teknologi Informasi dan Multimedia 2015, 2015, no. 2, pp. 43–47.

L. Zhang, L. Jiang, C. Li, and G. Kong, “Two Feature Weighting Approaches for Naive Bayes Text Classifiers,” Knowledge-Based Syst., vol. 100, pp. 137–144, 2016.

J. Song, K. T. Kim, B. Lee, S. Kim, and H. Y. Youn, “A novel classification approach based on Naïve Bayes for Twitter sentiment analysis,” KSII Trans. Internet Inf. Syst., vol. 11, no. 6, pp. 2996–3011, 2017.

J. Zhu, J. Xu, C. Zhang, and Y. Gao, “Marine Fishing Ground Prediction Based on Bayesian Decision Tree Model,” in Proceedings of the 2017 International Conference on Management Engineering, Software Engineering and Service Sciences, 2017, pp. 316–320.

N. Sun, B. Sun, J. D. Lin, and M. Y. Wu, “Lossless Pruned Naive Bayes for Big Data Classifications,” Big Data Res., vol. 14, pp. 27–36, 2018.

J. Wu, “A Generalized Tree Augmented Naive Bayes Link Prediction Model,” J. Comput. Sci., vol. 27, pp. 206–217, 2018.

L. Yu, L. Jiang, W. Dianhong, and L. Zhang, “Toward naive Bayes with attribute value weighting,” Neural Comput. Appl., vol. 5, pp. 1–15, 2018.

T. R. Golub et al., “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science (80-. )., vol. 286, no. 5439, pp. 531–537, 1999.

T. S. Furey, N. Cristianini, N. Duffy, D. W. Bednarski, M. Schummer, and D. Haussler, “Support vector machine classification and validation of cancer tissue samples using microarray expression data,” Bioinformatics, vol. 16, no. 10, pp. 906–914, 2000.

R. Kohavi, “Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid,” in Proceedings Second International Conference Knowledge Discovery and Data Mining (KDD ’96), 1996, pp. 202–207.

Z. Xie, W. Hsu, Z. Liu, and M. L. Lee, “SNNB: A Selective Neighborhood Based Naive Bayes for Lazy Learning,” in Proceedings Sixth Pacific-Asia Conference Knowledge Discovery and Data Mining (KDD ’02), 2002, pp. 104–114.

E. Frank, M. Hall, and B. Pfahringer, “Locally Weighted Naive Bayes,” in Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence, 2003, pp. 249–256.

Z. Zheng and G. I. Webb, “Lazy Learning of Bayesian Rules,” Mach. Learn., vol. 41, no. 1, pp. 53–84, 2000.

Y. F. Safri, R. Arifudin, and M. A. Muslim, “K-Nearest Neighbor and Naive Bayes Classifier Algorithm in Determining The Classification of Healthy Card Indonesia Giving to The Poor,” Sci. J. Informatics, vol. 5, no. 1, pp. 9–18, 2018.

C. Bielza and P. Larrañaga, “Discrete Bayesian Network Classifiers: A Survey,” ACM Comput. Surv., vol. 47, no. 1, pp. 5:1-5:43, 2014.

S. Zhang, Z. Jin, and X. Zhu, “Missing data imputation by utilizing information within incomplete instances,” J. Syst. Softw., vol. 84, no. 3, pp. 452–459, 2011.

P. H. Abreu, M. S. Santos, M. H. Abreu, B. Andrade, and D. C. Silva, “Predicting Breast Cancer Recurrence Using Machine Learning Tehniques: A Systematic Review,” ACM Comput. Surv., vol. 49, no. 3, pp. 52:1-52:40, 2016.

M. M. Suarez-Alvarez, D.-T. Pham, M. Y. Prostov, and Y. I. Prostov, “Statistical approach to normalization of feature vectors and clustering of mixed datasets,” in Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2012, vol. 468, no. 2145, pp. 2630–2651.

R. J. Freund and W. J. Wilson, Statistical Methods (2nd ed.). Academic Press, 2003.

I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene Selection for Cancer Classification using Support Vector Machine,” Mach. Learn., vol. 46, no. 1–3, pp. 389–422, 2002.

P. Pavlidis, J. Weston, J. Cai, and W. N. Grundy, “Gene functional classification from heterogeneous data,” in Proceedings of the fifth annual international conference on Computational biology - RECOMB ’01, 2001, no. 212, pp. 1–11.

Published
2020-02-01
Section
Artikel Teknologi Informasi

Most read articles by the same author(s)