Parallel Clustering Algorithms: Segmenting Chinese A-Share Stocks Using Financial Indicators
Abstract
This study presents a novel application of parallel clustering algorithms for segmenting stocks in the Chinese A-share market based on financial indicators. Using the Hadoop platform and Mahout software library, we implemented and compared the performance of K-means and fuzzy K-means algorithms across five distance measures: Euclidean, squared Euclidean, Manhattan, cosine, and Tanimoto. The analysis utilized 15 financial indicators from 2,544 listed companies, reflecting profitability, solvency, growth capability, asset management quality, and shareholder profitability. Experimental results demonstrate that for stock financial data clustering, the K-means algorithm with Tanimoto distance yields optimal execution efficiency and clustering quality, while the fuzzy K-means algorithm performs best with squared Euclidean distance. However, the K-means algorithm proved more effective overall, successfully categorizing 1,483 stocks into 26 meaningful segments compared to only 511 stocks in 27 segments by fuzzy K-means. The resulting stock segmentation framework divides the market into eight comprehensive categories based on investment value and security, providing investors with practical guidance for stock selection. Our approach enables investors to understand fundamental characteristics of each stock segment, discern their distinctive features, and identify undervalued stocks with appreciation potential. This research represents the first application of parallel big data clustering algorithms to segment the entire Chinese A-share market, offering significant practical value for investment decision-making.
Downloads
References
[2] Chou, C.H., Chen, W.N., Chang, Z.Y.. Application of Cluster Analysis in Securities Investment. Journal of Chongqing University (Natural Science Edition), 2002, 25(7): 122~126 Z h o u Z H, C h e n W N, Z ha n g Z Y. Application of cluster analysis in stock i n v e s t m e n t . J o u r n a l o f C h o n g q i n g University( Natura l Science Ed ition), 2002, 25(7): 122~126
[3] Lanjun Lao, Yumin Shao. Dynamic cluster analysis of sectoral return series in the Chinese stock market. Financial Research, 2004, 30(11): 75~82 Lao L J, Shao Y M. Dynamic clustering analysis of return series of industrial indexes in Chinese stock market. journal of Finance and Economics, 2004, 30( Journal of Finance and Economics, 2004, 30( 11): 75~82
[4] Li Yunfei, Li Pengyan. Selection of stock investment value evaluation indexes based on fuzzy clustering technique. Journal of Yanshan University, 2008. 32(6): 551-556
[5] Sun Leiping. Application of Data Mining Methods in Stock Analysis and Research (Master's Thesis). Chengdu: Southwest University of Finance and Economics, 2013. Sun L P. The application and research of data mining in stock analysis (master dissertation). Chengdu: Southwestern University of Finance and Economics, 2013
[6] Deng Xiuqin. The application of cluster analysis in stock market sector analysis Use. Mathematical Statistics and Management, 1999, 18(5): 1~4 Deng X Q. Application of cluster analysis in stock market board analysis. Journal of Applied of Statistics and Management, 1999, 18(5): 1~4
[7] Yang, F.Y.. Application of Data Mining Techniques in Stock Investment Changsha: Hunan University, 2010 Yang F Y. Application of data mining in stock investment (master dissertation). Changsha: Hunan University, 2010 Yang F Y. Application of data mining in stock investment (master dissertation). Changsha: Hunan University, 2010
[8] Zhang, Chuanqi. Research on stock sector classification based on ant colony clustering algorithm (Master's thesis). Shanghai: Fudan University, 2012
Copyright (c) 2025 Journal of Systems Engineering and Information Technology (JOSEIT)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).