Updates and User Training for the MPI tools Vampir and MUST

July 25-26, 2019

Date/Time/Location	Presentation and Demos: July 25, 2019 10:00am-noon: B451 R1025 (White Room). Note that this location is a Property Protection Area. Foreign national temporary escorted building access procedures apply. Registration not required. Hands-on Training Session: July 25, 2019: 1:30pm-5:00pm: Laboratory Training Center 2, Trailer 1889 Classroom 1 (near the West Gate Badge Office). This is a Common Access Area. Registration required - contact Dong Ahn (ahn1@llnl.gov). Individual Meetings (by appointment): Jul 26, 2019: 9:00am-5:00pm: B453 R1016 (Vis Development Lab). Note that this location is a Property Protection Area. Foreign national temporary escorted building access procedures apply. Individual sessions must be scheduled through Dong Ahn (ahn1@llnl.gov)
Description	Updates and User Training for the MPI tools Vampir and MUST High performance computing system architectures challenge application developers with heterogeneity and increasing system scale. Tools aid application developers and system support personnel in tuning applications for these systems and in avoiding correctness errors. Thus, we present the Vampir tool suite that provides deep and detailed insights into application performance for various architectures and programming paradigms, and the MUST tool that provides runtime error detection for MPI applications. After a short introduction to the use cases that these tools target, we will present workflows and advanced features for tool usage at scale, as well as for multi-paradigm applications (e.g., OpenMP-MPI). We will conclude our presentations with a summary of novel features (last 2 years) and ongoing development. Most importantly this includes the monitoring component Score-P that Vampir uses. This component unifies instrumentation and performance measurement for a wide range of tools. Finally, we invite interested application developers and system support groups to discuss best strategies for using our tools for their specific use cases, as well as to provide us feedback on useful/missing functionality. Additionally, the presenters will be available on Jul 25 from 1:00-5:00pm for hands-on training sessions, and during the day of Jul 26 for individual meetings with application developers and development teams. Please contact Dong Ahn (ahn1@llnl.gov) if you are interested in the hands-on training session or scheduling an individual meeting. See the "Additional Information" section below for a more in-depth discussion on these tools.
Presenters:	Dr. Matthias Weber, TU Dresden Joachim Protze, RWTH Aachen
Registration:	No registration is required for the Jul 25 presentation. Seating is on a first-come, first-served basis. Reservations are required in advance for the Jul 25 hands-on session and individual meetings. Please contact Dong Ahn (ahn1@llnl.gov).
Additional Information:	http://www.itc.rwth-aachen.de/MUST/ https://vampir.eu/ https://pruners.github.io/archer/ The MUST tool targets the detection of usage errors of MPI. It primarily serves for removing errors that manifested in runs that showed hangs, crashes, or wrong results. Further, the tool aids in cases where it is unclear whether a defect exists in an application or a system library, e.g., the MPI library. MUST provides a wide range of correctness analyses that include simple MPI resource usage issues, datatype mismatch situations, collective consistency analysis, and deadlock detection. Also, an analysis checks whether communication buffers overlap--sending and receiving with multiple MPI operations on equal memory regions. Such errors can be hard to track and reproduce in practice. All of MUST's checks target scalability and showed low overhead (usually below 100% increased application runtime) tested with up to 16,384 processes on a BG/Q system. Recent updates to MUST include basic analysis for multi-threaded MPI applications. Ongoing development of MUST considers OpenMP and hybrid MPI-OpenMP checks, as well as analysis based on compiler-time information. Since we recently started to integrate the OpenMP data race detection tool Archer into the analysis of MUST, we will give a short introduction into the usage of Archer and how the integrated use of MUST + Archer will give you the best debugging experience for hybrid MPI + OpenMP applications. By the nature of data race detection, this combination of tools can cause significant runtime overhead of 4-20x. Since data race detection has this cost anyways, we will investigate what additional benefits we can derive from this analysis. The Vampir tool suite serves for performance analysis. The tool visualizes the behavior of massively parallel application to highlight bottlenecks and inefficiencies. Basic profiling information guides the tool user towards interesting spots and detailed timeline views then provide an understanding of why the application exhibits an inefficiency. This information provides application developers and maintainers with input for performance optimization. Repeated runs with optimized codes than highlight the effects and efficacy of the individual optimizations. Vampir uses a post-mortem approach where the monitoring component Score-P captures application behavior during runtime and the visualization component Vampir/VampirServer then visualizes this data after an application run. The long and community driven development of the monitor Score-P yields a wide range of features that include different instrumentation types, hardware performance counter support, energy counter support, and multi programming paradigm support (MPI, OpenMP, CUDA, ...). Scalability features of both Score-P and Vampir then enable performance analysis with 10,000's of processes where frontier use cases applied Vampir with 200,000 MPI processes. Especially the I/O analysis capabilities of the tools Score-P and Vampir have been significantly enhanced in the last years. The tool suite now provides multiple performance charts for analyzing applications' I/O behavior. The measurement component Score-P as well as MUST are open source projects and licenses for all features of Vampir are available. Installations of all tools are available on Tri-Lab clusters and on virtual machines for demonstration purposes.
Fee	No cost
Level/Prerequisites	Familiarity with running parallel programs in an HPC environment is highly recommended.
Questions?	Please call or send e-mail to Blaise Barney (925-422-2578 / blaiseb@llnl.gov)