Comparison and Optimization of Parallel Clustering Algorithms for Chinese A-Share Stock Segmentation Based on Financial Indicators

Hai Mo; Niu Yihan; Zhang Yuejin

doi:10.29207/joseit.v4i1.6535

Hai Mo Central University of Finance and Economics, China
Niu Yihan Central University of Finance and Economics, China
Zhang Yuejin Shanghai Pudong Development Bank, Kunming, China

DOI: https://doi.org/10.29207/joseit.v4i1.6535

Keywords: stock market segmentation, financial indicators, K-means clustering, big data analytics, parallel algorithms

Abstract

This study presents a novel application of parallel clustering algorithms to segment stocks in the Chinese A-share market based on financial indicators. Using the Hadoop platform and Mahout software library, we implemented and compared the performance of the K-means and fuzzy K-means algorithms across five distance measures: Euclidean, squared Euclidean, Manhattan, cosine, and Tanimoto. The analysis utilized 15 financial indicators from 2,544 listed companies to reflect profitability, solvency, growth capability, asset management quality, and shareholder profitability. The experimental results demonstrate that for stock financial data clustering, the K-means algorithm with Tanimoto distance yields optimal execution efficiency and clustering quality, whereas the fuzzy K-means algorithm performs best with squared Euclidean distance. However, the K-means algorithm proved to be more effective overall, successfully categorizing 1,483 stocks into 26 meaningful segments compared to only 511 stocks in 27 segments using fuzzy K-means. The resulting stock segmentation framework divides the market into eight comprehensive categories based on investment value and security, thereby providing investors with practical guidance for stock selection. Our approach enables investors to understand the fundamental characteristics of each stock segment, discern their distinctive features, and identify undervalued stocks with appreciative potential. This study represents the first application of parallel big data clustering algorithms to segment the entire Chinese A-share market, offering significant practical value for investment decision-making.

Downloads

Download data is not yet available.

References

Y. Li, P. Ni, and V. Chang, “Application of deep reinforcement learning in stock trading strategies and stock forecasting,” Computing, vol. 102, no. 6, pp. 1305–1322, Jun. 2020, doi: 10.1007/s00607-019-00773-w.

G. Kumar, S. Jain, and U. P. Singh, “Stock Market Forecasting Using Computational Intelligence: A Survey,” Archives of Computational Methods in Engineering, vol. 28, no. 3, pp. 1069–1101, May 2021, doi: 10.1007/s11831-020-09413-5.

W. Zhu, M. Li, C. Wu, and S. Liu, “Comprehensive financial health assessment using Advanced machine learning techniques: Evidence based on private companies listed on ChiNext,” PLoS One, vol. 19, no. 12, p. e0314966, Dec. 2024, doi: 10.1371/journal.pone.0314966.

C. Zhao, X. Yuan, J. Long, L. Jin, and B. Guan, “Financial indicators analysis using machine learning: Evidence from Chinese stock market,” Financ Res Lett, vol. 58, 2023, doi: 10.1016/j.frl.2023.104590.

S. Iacuzzi, “An appraisal of financial indicators for local government: a structured literature review,” Journal of Public Budgeting, Accounting and Financial Management, vol. 34, no. 6, 2021, doi: 10.1108/JPBAFM-04-2021-0064.

P. Hájek, “Combining bag-of-words and sentiment features of annual reports to predict abnormal stock returns,” Neural Comput Appl, vol. 29, no. 7, pp. 343–358, Apr. 2018, doi: 10.1007/s00521-017-3194-2.

J. Irani, N. Pise, and M. Phatak, “Clustering Techniques and the Similarity Measures used in Clustering: A Survey,” Int J Comput Appl, vol. 134, no. 7, 2016, doi: 10.5120/ijca2016907841.

M. Steinbach, G. Karypis, and V. Kumar, “A Comparison of Document Clustering Techniques,” KDD workshop on text mining, vol. 400, 2000, doi: 10.1109/ICCCYB.2008.4721382.

P. Kumar and A. Kanavalli, “A Similarity based K-Means Clustering Technique for Categorical Data in Data Mining Application,” International Journal of Intelligent Engineering and Systems, vol. 14, no. 2, 2021, doi: 10.22266/ijies2021.0430.05.

A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Inf Sci (N Y), vol. 622, 2023, doi: 10.1016/j.ins.2022.11.139.

M. Ahmed, R. Seraj, and S. M. S. Islam, “The k-means algorithm: A comprehensive survey and performance evaluation,” 2020. doi: 10.3390/electronics9081295.

M. B. Ferraro, “Fuzzy k-Means: history and applications,” Econom Stat, vol. 30, 2024, doi: 10.1016/j.ecosta.2021.11.008.

F. Nie, X. Zhao, R. Wang, X. Li, and Z. Li, “Fuzzy K-Means Clustering with Discriminative Embedding,” IEEE Trans Knowl Data Eng, vol. 34, no. 3, 2022, doi: 10.1109/TKDE.2020.2995748.

R. Anil et al., “Apache Mahout: Machine Learning on Distributed Data ow Systems,” Journal of Machine Learning Research, vol. 21, 2020.

X. T. Li and X. H. Duan, “A K-Means Clustering Algorithm for Early Warning of Financial Risks in Agricultural Industry,” Security and Communication Networks, vol. 2022, 2022, doi: 10.1155/2022/3751539.

F. Zhang, Y. Ding, and Y. Liao, “Financial Data Collection Based on Big Data Intelligent Processing,” International Journal of Information Technologies and Systems Approach, vol. 16, no. 3, 2023, doi: 10.4018/IJITSA.320514.

M. Mallikarjuna and R. P. Rao, “Application of data mining techniques to classify world stock markets,” International Journal of Emerging Trends in Engineering Research, vol. 8, no. 1, 2020, doi: 10.30534/ijeter/2020/09812020.

F. Baser, O. Koc, and A. S. Selcuk-Kestel, “Credit risk evaluation using clustering based fuzzy classification method,” Expert Syst Appl, vol. 223, 2023, doi: 10.1016/j.eswa.2023.119882.

P. D’Urso, L. De Giovanni, R. Massari, R. L. D’Ecclesia, and E. A. Maharaj, “Cepstral-based clustering of financial time series,” Expert Syst Appl, vol. 161, p. 113705, Dec. 2020, doi: 10.1016/j.eswa.2020.113705.

A. Q. Md et al., “Novel optimization approach for stock price forecasting using multi-layered sequential LSTM,” Appl Soft Comput, vol. 134, p. 109830, Feb. 2023, doi: 10.1016/j.asoc.2022.109830.

T. Li, G. Kou, Y. Peng, and P. S. Yu, “An Integrated Cluster Detection, Optimization, and Interpretation Approach for Financial Data,” IEEE Trans Cybern, vol. 52, no. 12, pp. 13848–13861, Dec. 2022, doi: 10.1109/TCYB.2021.3109066.

Z. Yingrui, Z. Chen, and L. Zhongxin, “Dynamic Analysis and Community Recognition of Stock Price Based on a Complex Network Perspective,” SSRN Electronic Journal, 2022, doi: 10.2139/ssrn.4090744.