NOTE This training event is in the past. This page has been left up for archival purposes

Workshop Title:

Parallel Performance Evaluation Using TAU


Thursday, August 24:

10:00 MDT/9:00 PDT - 1:00 MDT/12:00 PDT:

  • Introduction to TAU
  • Instrumentation using tau_exec with MPI, OpenMP OMPT, CUDA, ROCm, Level Zero, and OpenACC
  •  I/O and memory evaluation
  • Hands-on

1:00 MDT/12:00 PDT - 2:30 MDT/1:30 PDT:

  • Lunch Break

2:30 MDT/1:30 PDT - 5:30 MDT/4:30 PDT:

  • Demonstration of analysis tools: Paraprof, TAUdb, PerfExplorer, Vampir, and Jumpshot
  • Using TAU with E4S Singularity containers on AWS
  • Hands-on

Friday, August 25:

  • One-on-one consultation sessions on 8/25 by appointment (schedule by sending email to ).



If attending virtually, you will receive a Webex link. If attending in person, please go to B451 R1025 [White Rm] 


To meet the needs of computational scientists to evaluate the performance of their parallel, scientific applications, Dr. Sameer Shende of ParaTools, Inc. will present the TAU Performance System and its interfaces to other tools such as PAPI, Score-P, Scalasca, OTF and Vampir. During the first day, the workshop will cover performance evaluation of applications on LLNL, Sandia, and LANL platforms. It will focus on performance data collection, analysis, and performance optimization. It will describe and demonstrate how performance data (both profile and trace data) can be collected in a straightforward manner using TAU's (Tuning and Analysis Utilities) automated instrumentation for C++/C/Fortran/Python, using MPI, CUDA, ROCm, Kokkos, OpenMP, OpenCL, OpenACC, and other programming models. It will cover how to analyze the performance data collected and drill down to find performance bottlenecks and determine their causes. The workshop will include hands-on sessions with sample codes that illustrate the different instrumentation and measurement choices available to users. Topics will cover generating performance profiles and traces with OpenMP instrumentation using the OpenMP Tools API with Intel compilers, memory utilization, I/O, and hardware performance counters data using PAPI.

The use of TAU for containerized distribution of software, including Singularity containers from the Extreme-scale Scientific Software Stack [E4S]( will be demonstrated with ROCm and CUDA and access to GPUs.

The second day of the workshop is dedicated to one-on-one consultation sessions with Dr. Shende for further, more in-depth, instruction and help in addressing performance bottlenecks in your codes. Please see below for information on scheduling an appointment.


Please register for the workshop below. Webex information will be sent to all virtual attendees.

Registration is required in order to ensure access to the Webex session and HPC machines used in the hands-on exercises.

In order to participate in the hands-on exercises, it will be necessary for participants to use their own computing resources to access LLNL/LANL/SNL HPC systems. In addition, AWS instances will be available for those without access to  LLNL/LANL/SNL HPC systems.


Participant Information
If not checked, it is assumed you will join by webex. Class size is limited by room size.