Critical Section Overhead Reduction for OpenMP Program by Nesting a Serial Loop to Increase Task Granularity of Parallel Loop

Keywords: critical section, overhead reduction, OpenMP, task granularity, Parallel loop

Abstract

This paper presents a simple method to reduce performance loss due to a parallel program's massive critical sections of parallel numerical integration. The method transforms a fine-grain parallel loop into a coarse grain parallel loop that nests a sequential loop. The coarse grain parallel loop is by nesting a loop block to make task granularities coarser than that naive one. In addition to the overhead reduction, the method makes the parallel work fraction significantly more significant than the serial fraction. As a result, nesting a serial loop within a parallel loop improves the parallel program's performance. Compared to the naïve method, which does not scale the performance of a parallel program of numerical integration, the nesting serial loop method scales a parallel program up to 3.26 times a fold relative to its sequential program on a quad-core processor. This result shows that the proposed method makes the parallel program much faster than the naïve method.

 

Downloads

Download data is not yet available.

References

S. Najem N and S. Sami I, "Multi-core Processor : Conceppts And Implementations," International Journal of Computer Science and Information Technology, pp. 01-10, 2018.

A. Rosà, E. Rosales and W. Binder, "Analysis and Optimization of Task Granularity on the Java Virtual Machine," ACM Transaction on Programming Languages and Systems, vol. 41, no. 5, p. 47, 2019.

A. Rosà, E. Rosales and W. Binder, "Analyzing and Optimizing Task Granularity on the JVM," in Association for Computing Machinery, New York, 2018.

J. M. Bull, "Measuring synchronisation and scheduling overheads in OpenMP," in Proceedings of First European Workshop on OpenMP, 1999.

M. Frigo, P. Halpern, C. E. Leiserson, Lewin-Berlin and Stephen, "Reducers and other Cilk++ hyperobjects," in Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, 2009.

K. Hasanov and A. Lastovetsky, "Hierarchical redesign of classic MPI reduction algorithms," The Journal of Supercomputing, vol. 73, no. 2, pp. 713-725, 2017.

E. Mohr, D. Kranz and R. Halstead, "Lazy task creation: a technique for increasing the granularity of parallel programs," IEEE Transactions on Parallel and Distributed Systems, vol. 2, no. 3, pp. 264-280, 1991.

K. Taura, K. Tabata and A. Yonezawa, "StackThreads/MP: Integrating Futures into Calling Standards," ACM SIGPLAN Notice, vol. 34, no. 8, pp. 60-71, 1999.

Adnan and M. Sato, "Efficient Work-Stealing Strategies for Fine-Grain Task Parallelism," in 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, Anchorage, 2011.

Adnan and M. Sato, "Dynamic Multiple Work Stealing Strategy for Flexible Load Balancing," IEICE Transactions on Information and Systems, vol. E95.D, no. 6, pp. 1565-1576, 2012.

A. Fonseca and B. Cabral, "Controlling the granularity of automatic parallel programs," Journal of Computational Science, vol. 17, no. 3, pp. 620-629, 2016.

J. M. Cardoso, J. G. F. Coutinho and P. C. Diniz, "Chapter 5 - Source code transformations and optimizations," in Embedded Computing for High Performance, Boston, Morgan Kaufmann, 2017, pp. 137-183.

J. L. "Gustafson, ""Amdahl's Law"," in "Encyclopedia of Parallel Computing", "Springer US", 2011, pp. "53--60".

C. Jung, D. Lim, J. Lee and S. Han, "Adaptive Execution Techniques for SMT Multiprocessor Architectures," in Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, Chicago.

Adnan, D. K. Oktahidayat and A. Achmad, "Performance Improvement with Non-Uniform Loads on SMT Processors," in 5th International Conference on Computing Engineering and Design (ICCED), 2019.

Published
2022-04-20
How to Cite
Adnan, A., Intan Sari Areni, & Zulkifli Tahir. (2022). Critical Section Overhead Reduction for OpenMP Program by Nesting a Serial Loop to Increase Task Granularity of Parallel Loop. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 6(2), 207 - 212. https://doi.org/10.29207/resti.v6i2.3848
Section
Information Technology Articles