STAT: Stack Trace Analysis Tool

The Stack Trace Analysis Tool (STAT) is a highly scalable, lightweight debugger for parallel applications. STAT works by gathering stack traces from all of a parallel application's processes and merging them into a compact and intuitive form. The resulting output indicates the location in the code that each application process is executing, which can help narrow down a bug. Furthermore, the merging process naturally groups processes that exhibit similar behavior into process equivalence classes. A single representative of each equivalence can then be examined with a full-featured debugger like TotalView or DDT for more in-depth analysis.

STAT has been ported to several platforms, including Linux clusters, IBM CORAL systems (i.e., IBM Power CPUs + NVIDIA GPUs), IBM's Blue Gene machines, and Cray systems. It works for Message Passing Interface (MPI) applications written in C, C++, and Fortran, supports threads, and supports CUDA. STAT has already demonstrated scalability over 1,000,000 MPI tasks and its logarithmic scaling characteristics position it well for even larger systems.

STAT is developed as a collaboration between the Lawrence Livermore National Laboratory, the University of Wisconsin, and the University of New Mexico. It is currently open source software released under the Berkeley Software Distribution (BSD) license. It builds on a highly portable, open source infrastructure, including LaunchMON for tool daemon launching, MRNet for scalable communication, and Dyninst for obtaining stack traces.

Platforms and Locations

Platform Location Notes
x86_64 TOSS 3 /usr/tce/packages/stat/* Multiple versions are available. Use module to load.
BG/Q /usr/local/tools/stat  
CORAL /usr/tcetmp/packages/stat/* usage details on LLNL internal wiki (requires authentication):

Quick Start

In a typical scenario, the STAT GUI (the stat-gui command) is used to debug a running/hung application as described below. STAT can also be used in command-line mode (the stat-cl command) - covered in the STAT Documentation. For details on how to use STAT on CORAL systems, please refer to the LC Confluence wiki page.

1. First, determine where (which node) the hung application is running. For parallel MPI jobs, this means finding where the srun (or equivalent mpirun, orterun, jsrun, lrun, etc.) master process is running.

On Linux clusters, use either the mjstat or squeue command. For mjstat, look under the "Master/Other" column. For squeue, it is the first node in the node list.

% mjstat

Scheduling pool data:
-------------------------------------------------------------
Pool        Memory  Cpus  Total Usable   Free  Other Traits 
-------------------------------------------------------------
pdebug     30000Mb    16     32     32     22  
pbatch*    30000Mb    16   1200   1193      5  

Running job data:
----------------------------------------------------------------------
JobID    User      Nodes Pool      Status        Used  Master/Other  
----------------------------------------------------------------------
787193   user1         4 pbatch    R             1:22  cab124
787048   user2        32 pbatch    R             4:40  cab1164
...
787187   userN        16 pbatch    R             4:42  cab1206


% squeue | grep 787051
787051    pbatch    Wflow schaich2   R      11:19     32 cab[177-196,280-283,288,290,296-299,304-305]

On BG/Q systems, the srun master process will be running on a front-end node, though not necessarily the one you are logged into. Use either of the methods shown below to determine where it is running.

% vulcanlac8@lee218:squeue -o "%i %u %B" | grep `whoami`
5602696 lee218 vulcanlac5

vulcanlac8@lee218:scontrol show job 5602696 | grep BatchHost
   BatchHost=vulcanlac5

On Sierra systems, jobs submitted through bsub will be placed on a launch node. Here is how to dtermine the launch node of a particular job:

[lee218@sierra4359:~]$ bjobs -noheader -X -o 'first_host' 241815
sierra4370

2. Start the STAT GUI using the stat-gui command.

  • Case 1: If the srun master process is running on the node where you start STAT, it will appear in STAT's Attach window. Click on the Attach button to have STAT attach to the process and gather a stack trace.
  • Case 2: If the srun master process is NOT running on the node where you started STAT, type in the name of the node where it is running (obtained from step 1 above) and then click the Search Remote Host button. STAT will find the srun master process on the remote host, as in Case 1. Then, click on the Attach button to have STAT attach to the process and gather a stack trace.

 

3. After STAT attaches to your application, a merged stack trace will appear (example below). At this point, you are able to interact with the STAT GUI to debug your application. For details on using the STAT GUI, please see the STAT User Guide.

Using the STAT GUI

STAT includes a graphical user interface (GUI) to run STAT and to visualize STAT's outputted call prefix trees. This GUI provides a variety of operations to help focus on particular call paths and tasks of interest. It can also be used to identify the various equivalence classes and includes an interface to attach a heavyweight debugger to the representative subset of tasks.

The STAT GUI is available on all CHAOS 5 x86_64 machines and BlueGene systems in /usr/local/bin/stat-gui and on TCE and CORAL systems in /usr/tce/bin/stat-gui. Man pages are also available (man stat-gui). Full documentation can be found in /usr/local/tools/stat/doc/ or /usr/tce/packages/stat/default/doc/ and in the STAT User Guide.

The toolbar on the left allows access to STAT's core operations on the application:

  • Attach - creates a dialog (Figure 5) to attach to a parallel application and set various options.
  • ReAttach - reattach to the last attached parallel application (bypasses the attach dialog).
  • Detach - detach from the application.
  • Resume - resumes the stopped application processes.
  • Pause - pauses the application processes.
  • Sample - pauses the application processes and gathers a single stack trace.
  • Sample Multiple - gathers multiple stack traces from the application processes, letting the processes run for a specified amount of time between samples.

The attach dialog allows you to select the application to attach to. Note: You will want to attach to the job launcher process (srun on CHAOS and BlueGene/Q systems or mpirun on BlueGene/P systems). By default, the attach dialog searches the localhost for the job launcher process, but you may specify an alternative hostname in the Search Remote Host text entry field. Thus, you may attach STAT to a batch job from a login node. On TOSS systems, the appropriate host is usually the lowest numbered node in your allocation. On BlueGene/P systems, there is usually a single dedicated node (i.e., rzdawndev4 or dawn13) and on BlueGene/Q, it is one of the dedicated nodes (i.e., seqlac5, seqlac6, vulcanlac5, or vulcanlac6).  In general, to find the appropriate node, you may be able to run:

squeue -j <your_slurm_job_id> -tr -o "%.7i %B"

When you left click on a node in the graph, you will get a pop-up window that lists the function name and the full set of tasks that took that call path. Right clicking on a node provides a pop-up menu with the same options.

The pop-up window has several buttons that allow you to manipulate the graph, allowing you to focus on areas of interest. Each button is defined as follows:

  • Join Equivalence Class - collapses all of the descendent nodes with the same equivalence class into the current node and renders in a new tab.
  • Collapse - hide all of the descendents of the selected node.
  • Collapse Depth - collapse the entire tree to the depth of the selected node.
  • Hide - the same as Collapse, but also hides the selected node.
  • Expand - show (unhide) the immediate children of the selected node.
  • Expand All - show (unhide) all descendents of the selected node.
  • Focus - hide all nodes that are neither ancestors nor descendents of the selected node. (Note: This will not unhide any hidden ancestors.)
  • View Source - creates a popup window (Figure 7) displaying the source file (only for stack traces with line number information). Requires the source file's path to be added to the search path, through File -> Add Search Paths.
  • Temporally Order Children (prototype only) - determine the temporal order of the node's children (only for stack traces with line number information). Requires the source file's path and all include paths to be added to the search path, through File -> Add Search Paths.
  • OK - closes the pop-up window.

The main window also has several tree manipulation options (note all of these operate on the current, visible state of the tree):

  • Undo - Undo the previous operation.
  • Redo - Redo the undone operation.
  • Reset - Revert to the original graph.
  • Layout - Reset the layout of the current graph and open in a new tab. This is useful for compacting wide trees after performing some pruning operations.
  • Cut - This feature allows you to collapse the prefix tree below the implementation frames for various programming models. MPI and pthreads are pre-configured in STAT and additional programming models can be specified via regular expressions
  • Join - Join consecutive nodes of the same equivalence class into a single node and render in a new tab. This is useful for condensing long call sequences.
  • [Traverse] Eq C - Traverse the prefix tree by expanding the leaves to the next equivalence class set. The first click will display the top-level equivalence class.
  • [Traverse Longest] Path - Traversal focus on the next longest call path(s). The first click will focus on the longest path.
  • [Traverse Shortest] Path - Traversal focus on the next shortest call path(s). The first click will focus on the shortest path.
  • [Traverse Least] Tasks - Traversal focus on the path(s) with the next least visiting tasks. The first click will focus on the path with the least visiting tasks.
  • [Traverse Most] Tasks - Traversal focus on the path(s) with the next most visiting tasks. The first click will focus on the path with the most visiting tasks.
  • [Traverse Least] TO - Temporal Order traversal focus on the path(s) that have made the least execution progress in the application. The first click will focus on the path that has made the least progress.
  • [Traverse Most] TO - Temporal Order traversal focus on the path(s) that have made the most execution progress in the application. The first click will focus on the path that has made the most progress.
  • Search - Search for call paths containing specified text, taken by specified tasks, or from specified hosts. Search text may be a regular expression, using the syntax described in http://docs.python.org/library/re.html.
  • EQ Classes - identify the equivalence classes of the tree. After clicking on this button, a window will pop up showing the complete list of equivalence classes. You can then select a single representative, all, or none of an equivalence classes' tasks to form a subset. The "Attach" buttons will launch the specified debugger and attach to the subset of tasks (note, this detaches STAT from the application). The "Debugger Options" button allows you to modify the debugger path.

Using The stat-cl Command Line

STAT can also be run from the command line via the stat-cl command. The only required argument to stat-cl is the PID of the srun process, or alternatively the hostname:PID of the srun process if it is running on a different node

vulcanlac5@lee218:ps xw | grep srun
17063 pts/0    Sl     0:00 srun /usr/local/tools/stat/share/STAT/examples/bin/mpi_ringtopo
17064 pts/0    S      0:00 srun /usr/local/tools/stat/share/STAT/examples/bin/mpi_ringtopo

vulcanlac5@lee218:stat-cl 17063
STAT started at 2017-06-12-12:55:27
...
Results written to /g/g0/lee218/stat_results/mpi_ringtopo.0094

Or when running from a remote node:

vulcanlac8@lee218:rsh vulcanlac5 ps x | grep srun
17063 pts/0    Sl     0:00 srun /usr/local/tools/stat/share/STAT/examples/bin/mpi_ringtopo
17064 pts/0    S      0:00 srun /usr/local/tools/stat/share/STAT/examples/bin/mpi_ringtopo

vulcanlac8@lee218:stat-cl vulcanlac5:17063
STAT started at 2017-06-12-12:57:13
...

Note on BG/Q systems, each srun invocation results in two srun proceses and you will want to attach to the first one (i.e., the one with the lower PID). STAT will create a stat_results directory in your current working directory. This directory will contain a subdirectory, based on your parallel application's executable name, with the merged stack traces in DOT graphics language format. DOT files can be used to view your application's stack traces (post-execution) with the stat-view utility, which takes .dot files as arguments.

  vulcanlac5@lee218:stat-view /g/g0/lee218/stat_results/mpi_ringtopo.0094/*.dot

Automatically Launching STAT Via The IO Watchdog Utility

STAT can be used in conjunction with the IO Watchdog utility, which monitors application output to detect hangs. To enable STAT with the IO Watchdog, add the following to the file $HOME/.io-watchdogrc:

search /usr/local/tools/io-watchdog/actions
timeout = 20m
actions = STAT, kill

You will then need to run your application with the srun --io-watchdog option:

% srun --io-watchdog mpi_application

When STAT is invoked, it will create a stat_results directory in the current working directory, as it would in a typical STAT run. The outputted .dot files can then be viewed with stat-view. For more details about using IO Watchdog, refer to the IO Watchdog README file in /usr/local/tools/io-watchdog/README.

Run-time Options

STAT includes a number of options, preferences and environment variables that influence how it behaves. See the STAT User Guide, for details.

Troubleshooting

Documentation And Links

References

LLNL-WEB-670397