Loop Block Profiling with Performance Prediction
Mohsin Khan, Maaz Ahmed, Waseem Ahmed, Rashid Mehmood, Abdullah Algarni, Aiiad Albeshri, Iyad Katib "Loop Block Profiling with Performance Prediction". International Journal of Computer Trends and Technology (IJCTT) V47(4):199-204, May 2017. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.
With increase in the complexity of High Performance Computing systems, the complexity of applications has increased as well. To achieve better performance by effectively exploiting parallelism from High Performance Computing architectures, we need to analyze/identify various parameters such as, the code hotspot (kernel), execution time, etc of the program. Statistics say that a program usually spends 90% of the time in executing less than 10% of the code. If we could optimize even some small portion of the 10% of the code that takes 90% of the execution time we have a high probability of getting better performance. So we must find the bottleneck, that is the part of the code which takes a long time to run which is usually called the hotspot. Profiling provides a solution to the question: which portions of the code should be optimized/parallelized, for achieving better performance. In this research work we develop a light-weight profiler that gives information about which portions of the code is the hotspot and estimates the maximum speedup that could be achieved, if the hotspot is parallelized.
 D. C. Suresh, W. A. Najjar, F. Vahid, J. R. Villarreal, and G. Stitt, “Profiling tools for hardware/software partitioning of embedded applications,” in ACM SIGPLAN Notices, vol. 38, pp. 189–198, ACM, 2003.
 D. A. Patterson, Computer architecture: a quantitative approach. Elsevier, 2011.
 D. Binkley, “Source code analysis: A road map,” in Future of Software Engineering, 2007. FOSE’07, pp. 104–119, IEEE, 2007.
 C. Dubach, J. Cavazos, B. Franke, G. Fursin, M. F. O’Boyle, and O. Temam, “Fast compiler optimisation evaluation using code-feature based performance prediction,” in Proceedings of the 4th international conference on Computing frontiers, pp. 131–142, ACM, 2007.
 H. Nilsson and P. Fritzson, “Lazy algorithmic debugging: Ideas for practical implementation,” Automated and Algorithmic Debugging, pp. 117–134, 1993.
 M. Harman, “The current state and future of search based software engineering,” in 2007 Future of Software Engineering, pp. 342–357, IEEE Computer Society, 2007.
 M. Woodside, G. Franks, and D. C. Petriu, “The future of software performance engineering,” in Future of Software Engineering, 2007. FOSE’07, pp. 171–187, IEEE, 2007.
 G. CanforaHarman and M. Di Penta, “New frontiers of reverse engineering,” in 2007 Future of Software Engineering, pp. 326–341, IEEE Computer Society, 2007.
 K. H. Bennett and V. T. Rajlich, “Software maintenance and evolution: a roadmap,” in Proceedings of the Conference on the Future of Software Engineering, pp. 73–87, ACM, 2000.
 A. Bertolino, “Software testing research: Achievements, challenges, dreams,” in 2007 Future of Software Engineering, pp. 85–103, IEEE Computer Society, 2007.
 D. Binkley and M. Harman, “Analysis and visualization of predicate dependence on formal parameters and global variables,” IEEE Transactions on Software Engineering, vol. 30, no. 11, pp. 715–735, 2004.
 L.-N. Pouchet, “Polybench: The polyhedral benchmark suite (2011),” URL http://www-roc. inria. fr/˜ pouchet/software/polybench, 2015.
 S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, “Rodinia: A benchmark suite for heterogeneous computing,” in Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on, pp. 44–54, Ieee, 2009.
 E. Bendersky, “Pycparse (2010),” URl: https://github. com/eliben/pycparser [Accessed: May 2017].
 S. Garcia, D. Jeon, C. Louie, and M. B. Taylor, “The kremlin oracle for sequential code parallelization,” IEEE Micro, vol. 32, no. 4, pp. 42–53, 2012.
 S. L. Graham, P. B. Kessler, and M. K. Mckusick, “Gprof: A call graph execution profiler,” in ACM Sigplan Notices, vol. 17, pp. 120–126, ACM, 1982.
 M. Kim, P. Kumar, H. Kim, and B. Brett, “Predicting potential speedup of serial code via lightweight profiling and emulations with memory performance model,” in Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, pp. 1318–1329, IEEE, 2012.
 C. von Praun, R. Bordawekar, and C. Cascaval, “Modeling optimistic concurrency using quantitative dependence analysis,” in Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp. 185–196, ACM, 2008.
 P. Wu, A. Kejariwal, and C. Ca¸scaval, “Compiler-driven dependence profiling to guide program parallelization,” in International Workshop on Languages and Compilers for Parallel Computing, pp. 232–248, Springer, 2008.
 A. Ketterlin and P. Clauss, “Profiling data-dependence to assist parallelization: Framework, scope, and optimization,” in Microarchitecture (MICRO), 2012 45th Annual IEEE/ACM International Symposium on, pp. 437–448, IEEE, 2012.
 L. Gao, J. Huang, J. Ceng, R. Leupers, G. Ascheid, and H. Meyr, “Totalprof: a fast and accurate retargetable source code profiler,” in Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis, pp. 305–314, ACM, 2009.
Profiling, Loop Block Profile, Code Analysis, Performance Prediction, Speedup Estimation.