Parallel Performance Evaluation Tools Workshop: August 16-17, 2022

Abstract

To meet the needs of computational scientists to evaluate the performance of their parallel, scientific applications, we present five parallel performance evaluation tools – TAU, PAPI, Score-P, OTF2, and Vampir. This two-day workshop will cover performance evaluation of applications on tri-lab OCF platforms including CTS-2, and consulting sessions. This workshop will focus on performance data collection, analysis, and performance optimization. After describing and demonstrating how performance data (both profile and trace data) can be collected in a straightforward manner using TAU’s (Tuning and Analysis Utilities) automated instrumentation, the bulk of the workshop will cover how to analyze the performance data collected and drill down to find performance bottlenecks and determine their causes. The workshop will include some sample codes that illustrate the different instrumentation and measurement choices available to the users. Topics will cover generating performance profiles and traces with memory utilization and headroom, I/O, and hardware performance counters data using PAPI. Hardware counter data can show not only which routines are taking the most time, but why? For example, because of cache misses, TLB misses, excess address arithmetic, or poor branch prediction behavior. The workshop will cover instrumentation of OpenMP programs using OMPT and measurement of a program’s memory footprint and energy usage. We will demonstrate scalable tracing using OTF2 and visualization using the Vampir trace analysis tool. Performance data analysis using ParaProf and PerfExplorer will be demonstrated using the performance data management framework (TAUdb) that includes TAU’s performance database. The workshop will also feature cross experiment analysis including comparing the effects of multi-core architectures on code performance. The demonstrations will include using TAU on programming paradigms such as ROCm, Intel oneAPI (DPC++/SYCL), OpenCL, OpenACC, CUDA, OMPT target offloading on HPC systems including Power 9 Linux, AWS, and HPE platforms. New features will include the use of TAU’s LLVM plugin for selective instrumentation for compiler-based instrumentation. The workshop will also cover using TAU in the ECP Extreme-Scale Scientific Software Stack (E4S) [https://e4s.io] using container technology and use of TAU, Score-P, OTF2, and Vampir on AWS. We will attempt to collect and analyze performance data for additional user codes during the hands-on portion of the workshop. Users and developers are welcome to contact the instructor ahead of time to begin collecting data to discuss at the workshop.

Agenda: All times Pacific

Day 1 (Aug 16, 2022):

9:00am - 10:30am : Introduction to TAU

10:30am-11:00am: Break

11:00am - 12:30pm: Hands-on: instrumentation, OpenMP (OMPT), PAPI support, GPU runtimes, MPI, paraprof

12:30pm - 2pm: Lunch break

2:00pm - 3:30pm: TAU, E4S, Score-P, Vampir, OTF2, PerfExplorer

3:30pm - 3:45pm: Break

3:45pm - 5pm: Hands-on session

5pm: Adjourn

Day 2 (Aug 17, 2022):

Pre-arranged meetings and consulting sessions with code teams.

About the instructor:

Sameer Shende serves as a Research Professor and the Director of the Performance Research Laboratory at the University of Oregon and the President and Director of ParaTools, Inc. and ParaTools, SAS, France. He is a technical lead for the TAU Performance System, Extreme-scale Scientific Software Stack (E4S), Program Database Toolkit, and HPC Linux projects. He is the PI of the Programming Models and Runtime (PMR) Software Development Kit (SDK) project for the Exascale Computing Project. His research interests include performance instrumentation, measurement, and analysis tools, software stacks, containers, programming models and runtime systems, compiler optimizations, LLVM, and cloud computing.