House Prices Segmentation Using Gaussian Mixture Model-Based Clustering
Abstract
House is a place for humans to live and the main necessity for humans. For years, the need for houses is increasing and varied so it affects the selling price of the house. Therefore, more research is needed to learn about the selling price of houses. This research is only focusing on house price segmentation in DKI Jakarta using the Gaussian Mixture Model-Based Clustering Method with the Expectation-Maximization algorithm. The goal of this research is to make a house price segmentation model so that we can obtain useful information for the potential buyer. Clustering with GMM utilizes the log-likelihood function to optimize the GMM parameters. The result of this research is housed in DKI Jakarta and can be segmented into 3 different clusters. The first cluster is for the low-profile houses. The second cluster is for the mid-profile houses. The third cluster is for high-profile houses. The silhouette score that was produced by the clustering method is 0.60866 meaning that this score is quite good because it’s close to a value of 1.
Downloads
References
M. Yazdani, “House Price Determinants and Market Segmentation in Boulder, Colorado: A Hedonic Price Approach,” Aug. 2021, https://doi.org/10.48550/arxiv.2108.02442.
T. S. Madhulatha, “An Overview on Clustering Methods,” May 2012, https://doi.org/10.48550/arxiv.1205.1117.
A. Saxena et al., “A review of clustering techniques and developments,” Neurocomputing, vol. 267, pp. 664–681, Dec. 2017, https://doi.org/10.1016/J.NEUCOM.2017.06.053.
M. S. Yang, C. Y. Lai, and C. Y. Lin, “A robust EM clustering algorithm for Gaussian mixture models,” Pattern Recognition, vol. 45, no. 11, pp. 3950–3961, Nov. 2012, https://doi.org/10.1016/J.PATCOG.2012.04.031.
H. Ling and K. Zhu, “Predicting Precipitation Events Using Gaussian Mixture Model,” Journal of Data Analysis and Information Processing, vol. 05, no. 04, pp. 131–139, Oct. 2017, https://doi.org/10.4236/JDAIP.2017.54010.
D. Reynolds, “Gaussian Mixture Models,” Encyclopedia of Biometrics, pp. 827–832, 2015, https://doi.org/10.1007/978-1-4899-7488-4_196.
X. He, D. Cai, Y. Shao, H. Bao, and J. Han, “Laplacian regularized Gaussian mixture model for data clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 9, pp. 1406–1418, 2011, https://doi.org/10.1109/TKDE.2010.259.
K. Li, Z. Ma, D. Robinson, and J. Ma, “Identification of typical building daily electricity usage profiles using Gaussian mixture model-based clustering and hierarchical clustering,” Applied Energy, vol. 231, pp. 331–342, Dec. 2018, https://doi.org/10.1016/J.APENERGY.2018.09.050.
E. Patel and D. S. Kushwaha, “Clustering Cloud Workloads: K-Means vs Gaussian Mixture Model,” Procedia Computer Science, vol. 171, pp. 158–167, Jan. 2020, https://doi.org/10.1016/J.PROCS.2020.04.017.
S. A. Alasadi and W. S. Bhaya, “Review of data preprocessing techniques in data mining,” Journal of Engineering and Applied Sciences, vol. 12, no. 16, pp. 4102–4107, Sep. 2017, https://doi.org/10.3923/JEASCI.2017.4102.4107.
H. P. Vinutha, B. Poornima, and B. M. Sagar, “Detection of outliers using interquartile range technique from intrusion dataset,” Advances in Intelligent Systems and Computing, vol. 701, pp. 511–518, 2018, https://doi.org/10.1007/978-981-10-7563-6_53/COVER.
S. Watanabe SWATANAB, “A Widely Applicable Bayesian Information Criterion,” Journal of Machine Learning Research, vol. 14, pp. 867–897, 2013, https://doi.org/10.5555/2567709.2502609.
J. E. Cavanaugh and A. A. Neath, “The Akaike information criterion: Background, derivation, properties, application, interpretation, and refinements,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 11, no. 3, p. e1460, May 2019, https://doi.org/10.1002/WICS.1460.
N. Sammaknejad, Y. Zhao, and B. Huang, “A review of the Expectation Maximization algorithm in data-driven process identification,” Journal of Process Control, vol. 73, pp. 123–136, Jan. 2019, https://doi.org/10.1016/J.JPROCONT.2018.12.010.
S. F. Qonita, “Segmentasi Citra MRI Tumor Otak Menggunakan Gaussian Mixture Model dan Hybrid Gaussian Mixture Model - Spatially Variant Finite Mixture Model Dengan Algoritma Expectation-Maximization,” Institut Teknologi Sepuluh Nopember, 2018.
A. Aditya, I. Jovian, and B. N. Sari, “Implementasi K-Means Clustering Ujian Nasional Sekolah Menengah Pertama di Indonesia Tahun 2018/2019,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 4, no. 1, pp. 51–58, Jan. 2020, https://doi.org/10.30865/MIB.V4I1.1784.
P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, no. C, pp. 53–65, Nov. 1987, https://doi.org/10.1016/0377-0427(87)90125-7.
J. Deng, Y. Deng, and K. H. Cheong, “Combining conflicting evidence based on Pearson correlation coefficient and weighted graph,” International Journal of Intelligent Systems, vol. 36, no. 12, pp. 7443–7460, Dec. 2021, https://doi.org/10.1002/INT.22593.
Copyright (c) 2022 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;