LC Hotline: 2-4531

From offsite: (925) 422-4531



8am–12pm, 1–4:45pm
B453 R1103 | Q-clearance area

Analyzing Parallel Program Performance Using HPCToolkit

March 13-14, 2019


March 13, 2019    9:00am - 10:00am     B453  R1001 (Armadillo Room).  

Individual Meetings (by appointment): 
March 13, 2019    1:00pm - 5:00pm     B451 R1040A  (Green Room)
March 14, 2019   9:00am - 3:00pm     B451 R1040A  (Green Room)
March 14, 2019   3:00pm - 5:00pm     T.B.A.

Note that both locations are Property Protection Areas. Foreign national temporary escorted building access procedures apply.


Analyzing Parallel Program Performance Using HPCToolkit

HPCToolkit is a multi-platform suite of tools that are designed to measure and analyze the performance of programs running on Linux systems ranging from desktops to supercomputers. HPCToolkit uses asynchronous sampling to measure performance on not only CPUs but GPU accelerators as well.  HPCToolkit can be used to identify idleness, inefficiency, and scalability losses within and across compute nodes.

Today, HPCToolkit can be used to measure the performance of applications running on x86_64 TOSS clusters, new IBM platforms such as Sierra, as well as ARM clusters. Over the last several years, we have been working as part of the CORAL project and the OpenMP standards committee to help shape the hardware and software stack for performance measurement and analysis. As a result of that work, we are in the process of rolling out new capabilities for measuring GPU-accelerated computations within and across compute nodes. Supported programming models include MPI, OpenMP, CUDA, RAJA and combinations thereof. This talk will describe deployed capabilities in HPCToolkit, emerging capabilities that enable measurement and analysis of GPU-accelerated applications, capabilities that simplify analysis of applications that use OpenMP with or without accelerators for node-level parallelism, and report on work in progress to fully deploy these capabilities. 

Following the March 13 presentation, interested parties can schedule one-on-one sessions with the speaker through March 14.  See the Registration section below for details.

More information about HPCToolkit can be found at:

Bio: John Mellor-Crummey is a Professor of Computer Science and Electrical and Computer Engineering at Rice University. He earned a BSE from Princeton University along with MS and PhD degrees from the University of Rochester. He is currently the principal investigator on two multi-institutional partnerships developing software tools for next-generation supercomputers –  one as part of the  DOE Exascale Computing Project and a second funded by the NSF. In 2013, Mellor-Crummey was named a Fellow of the Association for Computing Machinery “for contributions to parallel and high performance computing.”  In 2006, he was a co-recipient of the Edsger W. Dijkstra Prize in Distributed Computing “for an outstanding paper on the principles of distributed computing, whose significance and impact on the theory and/or practice of distributed computing has been evident for at least a decade.”

No registration is required for the March 13 presentation in the B453 Armadillo Room.

Individual sessions: Interested developers can schedule one-on-one meeting times with the speaker. These individual sessions must be scheduled in advance by contacting Matt Legendre (925-422-6525 /

Fee No cost
Level/Prerequisites Familiarity with running parallel programs in an HPC environment is highly recommended.
Questions? Please call or send e-mail to Matt Legendre (925-422-6525 /