Comparison and Optimization of Parallel Clustering Algorithms for Chinese A-Share Stock Segmentation Based on Financial Indicators
Abstract
This study presents a novel application of parallel clustering algorithms to segment stocks in the Chinese A-share market based on financial indicators. Using the Hadoop platform and Mahout software library, we implemented and compared the performance of the K-means and fuzzy K-means algorithms across five distance measures: Euclidean, squared Euclidean, Manhattan, cosine, and Tanimoto. The analysis utilized 15 financial indicators from 2,544 listed companies to reflect profitability, solvency, growth capability, asset management quality, and shareholder profitability. The experimental results demonstrate that for stock financial data clustering, the K-means algorithm with Tanimoto distance yields optimal execution efficiency and clustering quality, whereas the fuzzy K-means algorithm performs best with squared Euclidean distance. However, the K-means algorithm proved to be more effective overall, successfully categorizing 1,483 stocks into 26 meaningful segments compared to only 511 stocks in 27 segments using fuzzy K-means. The resulting stock segmentation framework divides the market into eight comprehensive categories based on investment value and security, thereby providing investors with practical guidance for stock selection. Our approach enables investors to understand the fundamental characteristics of each stock segment, discern their distinctive features, and identify undervalued stocks with appreciative potential. This study represents the first application of parallel big data clustering algorithms to segment the entire Chinese A-share market, offering significant practical value for investment decision-making.
Downloads
References
Y. Li, P. Ni, and V. Chang, “Application of deep reinforcement learning in stock trading strategies and stock forecasting,” Computing, vol. 102, no. 6, pp. 1305–1322, Jun. 2020, doi: 10.1007/s00607-019-00773-w.
G. Kumar, S. Jain, and U. P. Singh, “Stock Market Forecasting Using Computational Intelligence: A Survey,” Archives of Computational Methods in Engineering, vol. 28, no. 3, pp. 1069–1101, May 2021, doi: 10.1007/s11831-020-09413-5.
W. Zhu, M. Li, C. Wu, and S. Liu, “Comprehensive financial health assessment using Advanced machine learning techniques: Evidence based on private companies listed on ChiNext,” PLoS One, vol. 19, no. 12, p. e0314966, Dec. 2024, doi: 10.1371/journal.pone.0314966.
C. Zhao, X. Yuan, J. Long, L. Jin, and B. Guan, “Financial indicators analysis using machine learning: Evidence from Chinese stock market,” Financ Res Lett, vol. 58, 2023, doi: 10.1016/j.frl.2023.104590.
S. Iacuzzi, “An appraisal of financial indicators for local government: a structured literature review,” Journal of Public Budgeting, Accounting and Financial Management, vol. 34, no. 6, 2021, doi: 10.1108/JPBAFM-04-2021-0064.
P. Hájek, “Combining bag-of-words and sentiment features of annual reports to predict abnormal stock returns,” Neural Comput Appl, vol. 29, no. 7, pp. 343–358, Apr. 2018, doi: 10.1007/s00521-017-3194-2.
J. Irani, N. Pise, and M. Phatak, “Clustering Techniques and the Similarity Measures used in Clustering: A Survey,” Int J Comput Appl, vol. 134, no. 7, 2016, doi: 10.5120/ijca2016907841.
M. Steinbach, G. Karypis, and V. Kumar, “A Comparison of Document Clustering Techniques,” KDD workshop on text mining, vol. 400, 2000, doi: 10.1109/ICCCYB.2008.4721382.
P. Kumar and A. Kanavalli, “A Similarity based K-Means Clustering Technique for Categorical Data in Data Mining Application,” International Journal of Intelligent Engineering and Systems, vol. 14, no. 2, 2021, doi: 10.22266/ijies2021.0430.05.
A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data,” Inf Sci (N Y), vol. 622, 2023, doi: 10.1016/j.ins.2022.11.139.
M. Ahmed, R. Seraj, and S. M. S. Islam, “The k-means algorithm: A comprehensive survey and performance evaluation,” 2020. doi: 10.3390/electronics9081295.
M. B. Ferraro, “Fuzzy k-Means: history and applications,” Econom Stat, vol. 30, 2024, doi: 10.1016/j.ecosta.2021.11.008.
F. Nie, X. Zhao, R. Wang, X. Li, and Z. Li, “Fuzzy K-Means Clustering with Discriminative Embedding,” IEEE Trans Knowl Data Eng, vol. 34, no. 3, 2022, doi: 10.1109/TKDE.2020.2995748.
R. Anil et al., “Apache Mahout: Machine Learning on Distributed Data ow Systems,” Journal of Machine Learning Research, vol. 21, 2020.
X. T. Li and X. H. Duan, “A K-Means Clustering Algorithm for Early Warning of Financial Risks in Agricultural Industry,” Security and Communication Networks, vol. 2022, 2022, doi: 10.1155/2022/3751539.
F. Zhang, Y. Ding, and Y. Liao, “Financial Data Collection Based on Big Data Intelligent Processing,” International Journal of Information Technologies and Systems Approach, vol. 16, no. 3, 2023, doi: 10.4018/IJITSA.320514.
M. Mallikarjuna and R. P. Rao, “Application of data mining techniques to classify world stock markets,” International Journal of Emerging Trends in Engineering Research, vol. 8, no. 1, 2020, doi: 10.30534/ijeter/2020/09812020.
F. Baser, O. Koc, and A. S. Selcuk-Kestel, “Credit risk evaluation using clustering based fuzzy classification method,” Expert Syst Appl, vol. 223, 2023, doi: 10.1016/j.eswa.2023.119882.
P. D’Urso, L. De Giovanni, R. Massari, R. L. D’Ecclesia, and E. A. Maharaj, “Cepstral-based clustering of financial time series,” Expert Syst Appl, vol. 161, p. 113705, Dec. 2020, doi: 10.1016/j.eswa.2020.113705.
A. Q. Md et al., “Novel optimization approach for stock price forecasting using multi-layered sequential LSTM,” Appl Soft Comput, vol. 134, p. 109830, Feb. 2023, doi: 10.1016/j.asoc.2022.109830.
T. Li, G. Kou, Y. Peng, and P. S. Yu, “An Integrated Cluster Detection, Optimization, and Interpretation Approach for Financial Data,” IEEE Trans Cybern, vol. 52, no. 12, pp. 13848–13861, Dec. 2022, doi: 10.1109/TCYB.2021.3109066.
Z. Yingrui, Z. Chen, and L. Zhongxin, “Dynamic Analysis and Community Recognition of Stock Price Based on a Complex Network Perspective,” SSRN Electronic Journal, 2022, doi: 10.2139/ssrn.4090744.
Copyright (c) 2025 Journal of Systems Engineering and Information Technology (JOSEIT)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).