Parallel Performance Evaluation Using TAU
Prof. Sameer Shende, University of Oregon and ParaTools, Inc.
To meet the needs of computational scientists to evaluate the performance of their parallel, scientific applications, we present the TAU performance system and its interfaces with other tools such as PAPI, Perfetto, OTF2, and Vampir. This two-day workshop will cover performance evaluation of applications on Tri-lab OCF platforms, and consulting sessions. This workshop will focus on performance data collection, analysis, and performance optimization. After describing and demonstrating how performance data (both profile and trace data) can be collected in a straightforward manner using TAU’s (Tuning and Analysis Utilities) automated instrumentation, the bulk of the workshop will cover how to analyze the performance data collected and drill down to find performance bottlenecks and determine their causes.
The workshop will include some sample codes that illustrate the different instrumentation and measurement choices available to the users. Topics will cover generating performance profiles and traces with memory utilization and headroom, I/O, and hardware performance counters data using PAPI. The workshop will demonstrate scalable tracing using OTF2 and visualization using the Vampir trace analysis tool. Performance data analysis using ParaProf and PerfExplorer will be demonstrated using the performance data management framework (TAUdb) that includes TAU’s performance database.
The workshop will also feature cross experiment analysis including comparing the effects of multi-core architectures on code performance. The demonstrations will include using TAU on programming paradigms such as ROCm, Intel oneAPI (DPC++/SYCL), OpenCL, OpenACC, CUDA,. It will include showing access to hardware performance counters on the MI300A GPUs and hardware PC sampling on the AMD MI300A GPU.
The workshop will also cover using TAU in the Extreme-Scale Scientific Software Stack (E4S) [https://e4s.io] using containers and AWS. E4S is a curated Spack based collection of HPC and Ai/ML tools and includes PETSc, Trilinos, TAU, HPCToolkit, HDF5, as well as TensorFlow, PyTorch and other Generative AI toolkits available on commercial cloud platforms.
We will attempt to collect and analyze performance data for additional user codes during the hands-on portion of the workshop. Users and developers are welcome to contact the instructor ahead of time to begin collecting data to discuss at the workshop.
The second day of the workshop is dedicated to one-on-one consultation sessions with Dr. Shende for further, more in-depth, instruction and help in addressing performance bottlenecks in your codes. Please see below for information on scheduling an appointment.
Schedule
Day 1: August 27
Location – LLNL Discovery Center (outside the LLNL fence); LVOC B6525 Conference Room
Register below if interested
9:00 PDT - 12:00 PDT:
- Introduction to TAU, E4S, AWS for hands-on
- Instrumentation using tau_exec with MPI, OpenMP OMPT, CUDA, ROCm, Level Zero, and OpenACC
- I/O and memory evaluation
- Hands-on with Paraprof, tau_exec, and Perfetto.dev
12:00 PDT - 1:30 PDT:
- Lunch Break
1:30 PDT - 4:00 PDT:
- Demonstration of analysis tools: Paraprof, TAUdb, PerfExplorer, Vampir, and Jumpshot.
- Using TAU on AWS using Adaptive Computing’s ODDC.
- Hands-on
Day 2: August 28
Location – Black Diamond Room, B453 Lobby
One-on-one consultation sessions on 8/28 by appointment (schedule by sending email to sameer@paratools.com <mailto:sameer@paratools.com>
TAU is available on LC systems: https://hpc.llnl.gov/software/development-environment-software/tau-tuning-and-analysis-utilities