Critical Section Overhead Reduction for OpenMP Program by Nesting a Serial Loop to Increase Task Granularity of Parallel Loop
Abstract
This paper presents a simple method to reduce performance loss due to a parallel program's massive critical sections of parallel numerical integration. The method transforms a fine-grain parallel loop into a coarse grain parallel loop that nests a sequential loop. The coarse grain parallel loop is by nesting a loop block to make task granularities coarser than that naive one. In addition to the overhead reduction, the method makes the parallel work fraction significantly more significant than the serial fraction. As a result, nesting a serial loop within a parallel loop improves the parallel program's performance. Compared to the naïve method, which does not scale the performance of a parallel program of numerical integration, the nesting serial loop method scales a parallel program up to 3.26 times a fold relative to its sequential program on a quad-core processor. This result shows that the proposed method makes the parallel program much faster than the naïve method.
Downloads
References
S. Najem N and S. Sami I, "Multi-core Processor : Conceppts And Implementations," International Journal of Computer Science and Information Technology, pp. 01-10, 2018.
A. Rosà, E. Rosales and W. Binder, "Analysis and Optimization of Task Granularity on the Java Virtual Machine," ACM Transaction on Programming Languages and Systems, vol. 41, no. 5, p. 47, 2019.
A. Rosà, E. Rosales and W. Binder, "Analyzing and Optimizing Task Granularity on the JVM," in Association for Computing Machinery, New York, 2018.
J. M. Bull, "Measuring synchronisation and scheduling overheads in OpenMP," in Proceedings of First European Workshop on OpenMP, 1999.
M. Frigo, P. Halpern, C. E. Leiserson, Lewin-Berlin and Stephen, "Reducers and other Cilk++ hyperobjects," in Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, 2009.
K. Hasanov and A. Lastovetsky, "Hierarchical redesign of classic MPI reduction algorithms," The Journal of Supercomputing, vol. 73, no. 2, pp. 713-725, 2017.
E. Mohr, D. Kranz and R. Halstead, "Lazy task creation: a technique for increasing the granularity of parallel programs," IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 3, pp. 264-280, 1991.
K. Taura, K. Tabata and A. Yonezawa, "StackThreads/MP: Integrating Futures into Calling Standards," ACM SIGPLAN Notice, vol. 34, no. 8, pp. 60-71, 1999.
Adnan and M. Sato, "Efficient Work-Stealing Strategies for Fine-Grain Task Parallelism," in 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, Anchorage, 2011.
Adnan and M. Sato, "Dynamic Multiple Work Stealing Strategy for Flexible Load Balancing," IEICE Transactions on Information and Systems, vol. E95.D, no. 6, pp. 1565-1576, 2012.
A. Fonseca and B. Cabral, "Controlling the granularity of automatic parallel programs," Journal of Computational Science, vol. 17, no. 3, pp. 620-629, 2016.
J. M. Cardoso, J. G. F. Coutinho and P. C. Diniz, "Chapter 5 - Source code transformations and optimizations," in Embedded Computing for High Performance, Boston, Morgan Kaufmann, 2017, pp. 137-183.
J. L. "Gustafson, ""Amdahl's Law"," in "Encyclopedia of Parallel Computing", "Springer US", 2011, pp. "53--60".
C. Jung, D. Lim, J. Lee and S. Han, "Adaptive Execution Techniques for SMT Multiprocessor Architectures," in Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, Chicago.
Adnan, D. K. Oktahidayat and A. Achmad, "Performance Improvement with Non-Uniform Loads on SMT Processors," in 5th International Conference on Computing Engineering and Design (ICCED), 2019.
Copyright (c) 2022 Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright in each article belongs to the author
- The author acknowledges that the RESTI Journal (System Engineering and Information Technology) is the first publisher to publish with a license Creative Commons Attribution 4.0 International License.
- Authors can enter writing separately, arrange the non-exclusive distribution of manuscripts that have been published in this journal into other versions (eg sent to the author's institutional repository, publication in a book, etc.), by acknowledging that the manuscript has been published for the first time in the RESTI (Rekayasa Sistem dan Teknologi Informasi) journal ;