NOTE This training event is in the past. This page has been left up for archival purposes

Professor John Mellor-Crummey will be visiting LLNL on Monday, August 14th and Tuesday, August 15th to talk about HPCToolkit, a performance analysis tool for HPC system such as El Capitan. John will be available for 1:1 consulting sessions and will be giving a talk.

The talk will be on Monday, August 14th in B453 R1012, the Black Diamond Room, from 1:00pm – 2:00pm.

Consulting sessions will be available Monday, August 14th and Tuesday, August 15th. They can be used to try HPCToolkit on your applications and examine their performance with Dr. Mellor-Crummey’s assistance. Sessions can be booked by emailing Matthew LeGendre.

Details

Title: HPCToolkit: Tools for Performance Measurement and Analysis at Exascale

Presenter: John Mellor-Crummey, Rice University

Location: B453 R1012, Black Diamond Room

Time: Monday, August 14th in B453 R1012, the Black Diamond Room, from 1:00pm – 2:00pm.

Webex link: https://llnlfed.webex.com/llnlfed/j.php?MTID=mae39de3ea208534794c6a45f9406d9fb

Abstract: Rice University has been extending its HPCToolkit performance tools to measure and analyze the performance of GPU-accelerated applications on emerging exascale parallel systems, including LLNL’s forthcoming El Capitan. HPCToolkit supports collection of instruction-level call path profiles and call path traces of GPU-accelerated applications. HPCToolkit’s measurement and analysis tools attribute metrics to calling contexts that span both CPUs and GPUs. To help developers understand the performance of complex GPU code generated from high-level programming models, HPCToolkit constructs sophisticated approximations of call path profiles for GPU computations. To support fine-grained analysis and tuning, HPCToolkit uses PC sampling and instrumentation to measure and attribute GPU performance metrics to source lines, loops, and inlined code. To supplement fine-grained measurements, HPCToolkit can measure GPU kernel executions using hardware performance counters. To provide a view of how an execution evolves over time, HPCToolkit can collect, analyze, and visualize call path traces within and across nodes.

This talk will describe our recent experiences applying HPCToolkit to measure and analyze the performance of applications on the Frontier supercomputer at Oak Ridge National Laboratory. In our largest experiment to date, HPCToolkit was used to measure and analyze the performance of an execution of the LAMMPS molecular modeling code that used 64K MPI ranks and 64K GPU tiles for 15 minutes. For that execution, HPCToolkit recorded 5.6TB of measurement data. HPCToolkit’s out-of-core strategies for data analysis and interactive presentation make working with data at this scale practical. This talk will highlight these strategies and include a live demo (network permitting).