July 13th


9:00AM–11:00AM PDT


John Mellor-Crummey, Rice University

Presenter Bio

John Mellor-Crummey is a Professor of Computer Science at Rice University in Houston, TX. His research focuses on software technology for high-performance parallel computing. His current research includes tools for measurement and analysis of application performance and tools for dynamic data race detection. He leads the research and development of the HPCToolkit Performance Tools, principally supported by the DOE Exascale Computing Project. His past work has included compilers and runtime systems for parallel computing, scalable software synchronization algorithms for shared-memory multiprocessors, and techniques for execution replay of parallel programs, and techniques for network performance analysis and optimization. Mellor-Crummey has co-led development of the OMPT tools interface for OpenMP 5. He is a co-recipient of the 2006 Dijkstra Prize in Distributed Computing and a Fellow of the ACM.


1st hour: formal talk; 2nd hour: deep dives and discussions for folks who want to remain; follow-on: 1:1 sessions the following week


Rice University’s HPCToolkit performance tools have undergone significant enhancements to support measurement and analysis of parallel applications on both CPU and GPU-accelerated scalable parallel systems. For CPU-based platforms, I will describe the status and plans for supporting forthcoming systems with Sapphire Rapids processors, including CTS-2 systems and Crossroads. For GPU-accelerated platforms, I will describe and demonstrate emerging capabilities for profiling and tracing GPU-accelerated applications on platforms equipped with NVIDIA, Intel, and AMD GPUs. Of particular note is (1) HPCToolkit’s ability to attribute performance to heterogeneous calling contexts for GPU kernels written using RAJA and Kokkos and (2) HPCToolkit’s ability to leverage a combination of hardware support, instrumentation, and compiler information to attribute performance to calling contexts, inlined code, loops, and statements within GPU kernels. Detailed attribution of application performance within GPU kernels is operational for NVIDIA and Intel GPUs; for systems based on AMD GPUs, we are awaiting support for fine-grain measurement within GPU kernels. Finally, there have been some significant enhancements to HPCToolkit to accelerate post-mortem analysis of CPU and GPU binaries as well as performance data collected for large-scale executions. 

For more information…

MSTeams (virtual meeting) Info

Information about the MS Teams meeting will be sent to you after you register for this tutorial

1:1 Sessions

Teams can sign up for 1:1 sessions with the HPCToolkit team on Wednesday and Thursday.

The poll lists slots in 1/2 hour increments. The purpose of the poll is for you to pick one or more adjacent slots when you or your group will meet individually with members of the Rice HPCToolkit team. If you want to ask some questions or offer some feedback, a half hour might be enough. If you want to try some hands-on work with your code, an hour minimum is more reasonable. Feel free to request as many contiguous time slots as you'd like.  If you prefer, we can meet to get you started using HPCToolkit and later coordinate by email and/or schedule a follow-up meeting the following week. There are three people on my team (John Mellor-Crummey, Marty Itzkowitz, Mark Krentel) with accounts at LANL, LLNL, and SNL on systems where work on export-controlled codes is allowed. We can work with as many as 3 groups in a calendar slot.