TAU: Tuning and Analysis Utilities

TAU (Tuning and Analysis Utilities) is a comprehensive profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, Java, and Python. It is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and statements. All C++ language features are supported including templates and namespaces. The instrumentation consists of calls to TAU library routines, which can be incorporated into a program in several ways:

  • Automatic instrumentation using the compiler
  • Automatic instrumentation using the Program Database Toolkit (PDT)
  • Manual instrumentation using the instrumentation API
  • At runtime using library call interception through the tau_exec command
  • Dynamically using DyninstAPI
  • At runtime in the Java virtual machine

Data Analysis and Visualization:

  • Profile data: TAU's profile visualization tool, ParaProf, provides a variety of graphical displays for profile data to help users quickly identify sources of performance bottlenecks. The text based pprof tool is also available for analyzing profile data.
  • Trace data: TAU provides the JumpShot trace visualization tool for graphical viewing of trace data. TAU also provide utilities to convert trace data into formats for viewing with Vampir, Paraver and other performance analysis tools.

Programming models and platforms: TAU supports most commonly used parallel hardware and programming models, including Intel, Cray, IBM, Sun, Apple, SGI, GPUs/Accelerators, HP, NEC, Fujitsu, MS Windows, using MPI, OpenMP, Pthreads, OpenCL, CUDA and Hybrid.

Platforms and Locations

Platform Location Notes
x86_64 Linux /usr/global/tools/tau/ Load the dotkit package: use tau
BG/Q /usr/global/tools/tau/ Load the dotkit package: use tau

Quick Start

TAU is a sophisticated, full-featured toolkit. Only a sube=set of TAU's features, at a "very basics" level, are discussed below. Users will need to consult the TAU documentation to learn more.

1. Profiling

The easiest and quickest way to profile an application is to use the tau_exec command. It automatically instruments your executable at run time, and requires no special compilation or modifications to source code. All you need to do is make sure your TAU environment is setup correctly.

1. Setup your TAU environment by loading the TAU dotkit package. Also, just to be sure, set the TAU_PROFILE environment variable to "1". Optionally, you can specify where the profile files are written (default is working directory).

% use tau
Prepending: tau (ok)
% setenv TAU_PROFILE 1
% setenv PROFILEDIR  /p/lscratche/joesmith/matmultProfiles 

2. Run your program using the tau_exec command. For example, launching a 64 task MPI job in the pdebug partition:

% srun -n64 -ppdebug tau_exec matmult 

3. Following completion of your job, you will have a set of files named profile.#.* where # denotes the MPI rank. Viewing these files is discussed in the Output section below.

Another way to automatically instrument your application is to use the TAU Makefile scripts. This is slightly more work, but is required for profiling some parameters, such as hardware counter (PAPI) events.

1. Setup your TAU environment by loading the TAU dotkit package. Also, just to be sure, set the TAU_PROFILE environment variable to "1". Optionally, you can specify where the profile files are written (default is working directory).

% use tau
Prepending: tau (ok)
% setenv TAU_PROFILE 1
% setenv PROFILEDIR  /p/lscratche/joesmith/matmultProfiles

2. Determine the path of the TAU libraries you've loaded. An easy way to do this is shown below:

% use -hv tau | grep LIBRARY
dk_alter LD_LIBRARY_PATH /usr/local/tools/tau-2.21.1/x86_64/lib

3. Using the TAU library path from above, select the appropriate TAU Makefile for what you want to profile. They are named according to what they instrument. For example:

% ls /usr/local/tools/tau-2.21.1/x86_64/lib/Makefile*
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-depthlimit-icpc-mpi-pdt
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-depthlimit-mpi-pdt
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-icpc-mpi-pdt
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-icpc-mpi-pdt-openmp-opari
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-icpc-papi-mpi-pdt
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-icpc-papi-mpi-pdt-openmp-opari
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-icpc-papi-pdt
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-icpc-papi-pdt-openmp-opari
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-icpc-papi-pthread-pdt
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-icpc-pdt
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-icpc-pdt-openmp-opari
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-icpc-pthread-pdt
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-mpi-pdt
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-mpi-pdt-openmp-opari
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-papi-mpi-pdt
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-papi-mpi-pdt-openmp-opari
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-papi-pdt
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-papi-pdt-openmp-opari
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-papi-pthread-pdt
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-param-icpc-mpi-pdt
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-param-mpi-pdt
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-pdt
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-pdt-openmp-opari
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-phase-icpc-papi-mpi-pdt
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-phase-papi-mpi-pdt
/usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-pthread-pdt 

4. Set TAU_MAKEFILE to the full pathname of the Makefile you choose. The first example uses the Intel compiler to profile MPI and PAPI, and the second example uses the GNU compiler:

% setenv TAU_MAKEFILE /usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-icpc-papi-mpi-pdt

or

% setenv TAU_MAKEFILE /usr/local/tools/tau-2.21.1/x86_64/lib/Makefile.tau-papi-mpi-pdt

5. Compile your application using the appropriate TAU compiler wrapper script. These are located in the /bin directory of the TAU package you loaded, and should be in your path. The choices are shown in the table below. Note that if you are using makefiles, you will need to substitute these wrapper scripts accordingly.

Language

TAU Compiler Wrapper

C

tau_cc.sh

C++

tau_cxx.sh

Fortran77

tau_f77.sh

Fortran90 tau_f90.sh

For example, compiling a simple C program:

% tau_cc.sh -O3 -g -o matmult matmult.c

Note that compiler options will get passed to the native compiler of your choice. Also note that TAU provides a number of its own compiler options, not discussed here. For details, see TAU Compiler Options.

6. Run your TAU instrumented executable as usual. For example, launching a 64 task MPI job in the pdebug partition:

% srun -n64 -ppdebug matmult

7. Following completion of your job, you will have a set of files named profile.#.* where # denotes the MPI rank. Viewing these files is discussed in the Output section below.

2. Tracing

TAU can be used to trace events during a program's execution. Unlike profiling, which aggregates the time spent in each routine, loop, etc. tracing allows you to view events as they relate to each other against a timeline. One caveat about tracing however, is that trace files can quickly grow to be very large, which makes tracing difficult or impossible for long running, many process jobs.

As with profiling, the easiest and quickest way to trace an application is to use the tau_exec command. It automatically instruments your executable at run time, and requires no special compilation or modifications to source code. All you need to do is make sure your TAU environment is setup correctly.

1. Setup your TAU environment by loading the TAU dotkit package. Also, make sure the TAU_TRACE environment variable is set to "1". If you want to specify a directory where the tracefiles should be written (default is the working directory), use the TRACEDIR environment variable.

% use tau
Prepending: tau (ok)
% setenv TAU_TRACE 1
% setenv TRACEDIR /p/lscratche/joesmith/matmultTracefiles

Note: If you want to include TAU profiling at the same time as tracing, set the TAU_PROFILE environment variable to "1". By default, it is turned off when tracing.

2. Run your program using the tau_exec command. For example, launching a 64 task MPI job in the pdebug partition:

% srun -n64 -ppdebug tau_exec matmult 

3. Following completion of your job, you will have a two sets of files named tautrace.#.*.trc and events.#.edf where # denotes the MPI rank. Viewing these files is discussed in the Output section below.

As with profiling, you can automatically instrument your application for tracing by using the TAU makefile scripts, as an alternative to using the tau_exec command.

1. First, setup your TAU environment by loading the TAU dotkit package. Also, make sure the TAU_TRACE environment variable is set to "1". If you want to specify a directory where the tracefiles should be written (default is the working directory), use the TRACEDIR environment variable.

% use tau
Prepending: tau (ok)
% setenv TAU_TRACE 1
% setenv TRACEDIR /p/lscratche/joesmith/matmultTracefiles 

2. Then, follow steps 2 through 6 above under Profiling.

3. Following completion of your job, you will have a two sets of files named tautrace.#.*.trc and events.#.edf where # denotes the MPI rank. Viewing these files is discussed in the Output section below.

3. PAPI Hardware Counters

TAU can be used to record hardware events through PAPI hardware counters. This is actually a type of profiling, so the instructions are very similar to those for Profiling above.

1. Follow steps 1 through 5 under Profiling using TAU Makefiles to build an instrumented executable.

2. Determine which PAPI events are available on the platform you are using with the papi_avail command:

% papi_avail
Available events and hardware information.
--------------------------------------------------------------------------------
PAPI Version             : 5.0.1.0
Vendor string and code   : GenuineIntel (1)
Model string and code    : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (45)
CPU Revision             : 6.000000
CPUID Info               : Family: 6  Model: 45  Stepping: 6
CPU Max Megahertz        : 2601
CPU Min Megahertz        : 1200
Hdw Threads per core     : 1
Cores per Socket         : 8
NUMA Nodes               : 2
CPUs per Node            : 8
Total CPUs               : 16
Running in a VM          : no
Number Hardware Counters : 11
Max Multiplex Counters   : 64
--------------------------------------------------------------------------------

    Name        Code    Avail Deriv Description (Note)
PAPI_L1_DCM  0x80000000  Yes   No   Level 1 data cache misses
PAPI_L1_ICM  0x80000001  Yes   No   Level 1 instruction cache misses
PAPI_L2_DCM  0x80000002  Yes   Yes  Level 2 data cache misses
PAPI_L2_ICM  0x80000003  Yes   No   Level 2 instruction cache misses
PAPI_L3_DCM  0x80000004  No    No   Level 3 data cache misses
PAPI_L3_ICM  0x80000005  No    No   Level 3 instruction cache misses
PAPI_L1_TCM  0x80000006  Yes   Yes  Level 1 cache misses
PAPI_L2_TCM  0x80000007  Yes   No   Level 2 cache misses
PAPI_L3_TCM  0x80000008  Yes   No   Level 3 cache misses
PAPI_CA_SNP  0x80000009  No    No   Requests for a snoop

[Lines of output deleted here]

PAPI_FP_OPS  0x80000066  Yes   Yes  Floating point operations
PAPI_VEC_SP  0x80000069  Yes   Yes  Single precision vector/SIMD instructions
PAPI_VEC_DP  0x8000006a  Yes   Yes  Double precision vector/SIMD instructions
PAPI_REF_CYC 0x8000006b  Yes   No   Reference clock cycles
-------------------------------------------------------------------------
Of 108 possible events, 50 are available, of which 17 are derived.

3. There are 25 counters available, but in practice, you can usually only use several at a time, because not all events can be counted together. Decide which events you want to count, and then find out if they are compatible or not with the papi_event_chooser command. The example below shows an incompatibility.

% papi_event_chooser PAPI_LD_INS PAPI_SR_INS PAPI_L1_DCM PAPI_L1_ICH
Event Chooser: Available events which can be added with given events.
--------------------------------------------------------------------------------
PAPI Version             : 5.0.1.0
Vendor string and code   : GenuineIntel (1)
Model string and code    : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (45)
CPU Revision             : 6.000000
CPUID Info               : Family: 6  Model: 45  Stepping: 6
CPU Max Megahertz        : 2601
CPU Min Megahertz        : 1200
Hdw Threads per core     : 1
Cores per Socket         : 8
NUMA Nodes               : 2
CPUs per Node            : 8
Total CPUs               : 16
Running in a VM          : no
Number Hardware Counters : 11
Max Multiplex Counters   : 64
--------------------------------------------------------------------------------

Event PAPI_L1_ICH can't be counted with others -7 

4. Set the COUNTER environment variables for compatible events of interest. It is recommended (or required) to set COUNTER1 to an available timer, such as GET_TIME_OF_DAY. For example:

% setenv COUNTER1 GET_TIME_OF_DAY
% setenv COUNTER2 PAPI_L1_DCM
% setenv COUNTER3 PAPI_L1_ICM
% setenv COUNTER4 PAPI_L2_DCM
% setenv COUNTER5 PAPI_L2_ICM 

5. Set the COUNTER environment variables for compatible events of interest. It is recommended (or required) to set COUNTER1 to an available timer, such as GET_TIME_OF_DAY. For example:

% setenv COUNTER1 GET_TIME_OF_DAY
% setenv COUNTER2 PAPI_L1_DCM
% setenv COUNTER3 PAPI_L1_ICM
% setenv COUNTER4 PAPI_L2_DCM
% setenv COUNTER5 PAPI_L2_ICM

5. Run your application as usual. Following execution, you will have a unique directory for each PAPI event. Inside each directory, there will be a set of files named profile.#.* where # denotes the MPI rank. Viewing these files is discussed in the Output section below.

% ls
MULTI__GET_TIME_OF_DAY  MULTI__PAPI_L2_DCM  matmult.f90 
MULTI__PAPI_L1_DCM      MULTI__PAPI_L2_ICM  matmult.o
MULTI__PAPI_L1_ICM      Makefile            matmult       
cab283% ls MULTI__PAPI_L1_DCM
profile.0.0.0  profile.2.0.0  profile.4.0.0  profile.6.0.0
profile.1.0.0  profile.3.0.0  profile.5.0.0  profile.7.0.0 

Note: you can perform tracing at the same time as recording PAPI events by setting the TAU_TRACE environment variable to "1". You cannot, however perform normal TAU profiling at the same time as PAPI.

4. Selective and Manual Instrumentation

TAU provides the ability for users to customize their application's instrumentation, thereby enabling them to focus on specific areas of interest, and reduce run-time overhead associated with profiling the entire application. There are two ways to do this, as discussed below.

Selective Instrumentation

1. Create a text file that contains the names of routines and/or source files that should be instrumented or not instrumented. The type of instrumentation can also be specified: loops, memory, I/O, etc.

2. Build your instrumented executable, following steps 1 through 5 under Profiling with TAU Makefiles above, and be sure to do the following:

  • Use a TAU Makefile that includes -pdt in its name
  • Include the TAU compiler wrapper script option -optTauSelectFile=filename, where filename is the name of your selective instrumentation text file.

For additional details, including the required syntax for the selective instrumentation text file, see the TAU documentation at https://www.cs.uoregon.edu/research/tau/docs/newguide/bk01ch01s03.html

Manual Instrumentation

The TAU Instrumentation API provides a means for users to place TAU routines in their source code to explicitly direct how TAU should instrument their application. There are over 125 routines available. For details, see the TAU documentation at https://www.cs.uoregon.edu/research/tau/docs/newguide/bk03rn01.html.

Output

1. Profiling

TAU profiling output consists of a set of files named profile.X.Y.Z where:
X = MPI rank number
Y = context
Z = thread number

pprof

To get a quick, text based summary of your job's profile data, the TAU pprof utility can be used. By default, it will process all of the profile.* files in the current directory and produce a report showing profile data for each rank/context/thread. An example for one MPI rank is shown below:

% use tau
Prepending: tau (ok)
% pprof
NODE 1;CONTEXT 0;THREAD 0:
---------------------------------------------------------------------------------------
%Time    Exclusive    Inclusive       #Call      #Subrs  Inclusive Name
              msec   total msec                          usec/call
---------------------------------------------------------------------------------------
100.0            5        4,489           1           1    4489358 .TAU application
99.9            5        4,484           1        4432    4484326 main
61.9        2,778        2,778         461           0       6028 multiply_matrices
32.8        1,472        1,473           1          44    1473099 MPI_Init()
  2.8          126          126        3010           0         42 MPI_Bcast()
  1.7           77           77         462           0        168 MPI_Recv()
  0.3           11           11         461           0         25 MPI_Send()
  0.3           11           11           1          48      11425 MPI_Finalize()
  0.0        0.171        0.171           5           0         34 MPI_Allgather()
  0.0        0.067         0.07           5          15         14 MPI_Comm_split()
  0.0        0.053        0.058          10          21          6 MPI_Comm_create()
  0.0        0.052        0.052          10           0          5 MPI_Allreduce()
  0.0        0.016        0.016          52           0          0 MPI_Errhandler_set()
  0.0         0.01         0.01          10           0          1 MPI_Group_incl()
  0.0        0.009        0.009          12           0          1 MPI_Comm_free()
  0.0        0.008        0.008           6           0          1 MPI_Type_contiguous()
  0.0        0.006        0.006          15           0          0 MPI_Group_free()
  0.0        0.004        0.004           4           0          1 MPI_Attr_put()
  0.0        0.004        0.004           5           0          1 MPI_Type_struct()
  0.0        0.003        0.003          13           0          0 MPI_Comm_rank()
  0.0        0.003        0.003          11           0          0 MPI_Type_commit()
  0.0        0.001        0.001           5           0          0 MPI_Comm_group()
  0.0            0            0           1           0          0 MPI_Comm_size()
---------------------------------------------------------------------------------------

USER EVENTS Profile :NODE 1, CONTEXT 0, THREAD 0
---------------------------------------------------------------------------------------
NumSamples   MaxValue   MinValue  MeanValue  Std. Dev.  Event Name
---------------------------------------------------------------------------------------
         5          4          4          4          0  Message size for all-gather
        10          4          4          4          0  Message size for all-reduce
      3010    2.4E+04          4  2.392E+04       1381  Message size for broadcast
       462    2.4E+04          8  2.395E+04       1115  Message size received from all nodes
       461    2.4E+04    2.4E+04    2.4E+04          0  Message size sent to all nodes
---------------------------------------------------------------------------------------

ParaProf

ParaProf is TAU's graphical profile analysis utility. To use ParaProf:

1. Go to the directory containing the profile.* files.

2. If you have not already done so, load the TAU environment and then issue the paraprof command:

% use tau
Prepending: tau (ok)
% paraprof

3. A manager window and profile summary window will appear. Left/right clicking on items in the summary window allows you to view different types of information, as do the various menu selections.

ParaProf includes many features for diving more deeply into your application's behavior - see the TAU Documentation for details. A few representative screenshots are provided below (click for a larger image).

2. Tracing

TAU tracing output consists of two sets of files named tautrace.X.Y.Z and events.X.edf where:
X = MPI rank number
Y = context
Z = thread number

Before TAU's trace files can be viewed, they must first be merged (for parallel jobs) and then converted to a suitable format for viewing by selected trace viewing tools. Two trace viewing tools are covered here.

Vampir/VampirServer

The Vampir trace visualizer provides a variety of means of examining OTF trace data, as generated through VampirTrace, OpenSpeedShop, or TAU. VampirServer is a client/server version of Vampir that can quickly extract and analyze data from large trace files by using a parallel backend. See the Vampir Documentation for details.

1. If you have not already done so, load the TAU and Vampir environments:

% use tau
Prepending: tau (ok)
% use vampir
Prepending: vampir (ok)

2. Go to the directory containing the TAU tracefiles and issue the tau_treemerge.pl command. This will merge all tautrace.* and events.* files into a single tau.trc file and a single tau.edf file.

% tau_treemerge.pl
/usr/global/tools/tau/training/tau-2.23.1/x86_64/bin/tau_merge -m tau.edf -e events.0.edf events.1.edf
events.2.edf events.3.edf events.4.edf events.5.edf events.6.edf events.7.edf tautrace.0.0.0.trc
tautrace.1.0.0.trc tautrace.2.0.0.trc tautrace.3.0.0.trc tautrace.4.0.0.trc tautrace.5.0.0.trc
tautrace.6.0.0.trc tautrace.7.0.0.trc tau.trc
tautrace.0.0.0.trc: 51511 records read.
tautrace.1.0.0.trc: 21897 records read.
tautrace.2.0.0.trc: 21897 records read.
tautrace.3.0.0.trc: 21897 records read.
tautrace.4.0.0.trc: 21099 records read.
tautrace.5.0.0.trc: 21113 records read.
tautrace.6.0.0.trc: 21099 records read.
tautrace.7.0.0.trc: 21099 records read. 

3. Convert the two merged TAU trace files into the Vampir otf format using the tau2otf utility:

% tau2otf tau.trc tau.edf matmult.otf 

4. Launch Vampir using the name of your otf file: 

% vampir matmult.otf

5. The Vampir main window will appear, allowing you to examine your application's trace events. A few representative screenshots are shown below (click for a larger image).

Jumpshot

The Jumpshot trace viewer from Argonne National Laboratory is included with the TAU installation.

1. If you have not already done so, load the TAU environment:

% use tau
Prepending: tau (ok) 

2. Go to the directory containing the TAU tracefiles and issue the tau_treemerge.pl command. This will merge all tautrace.* and events.* files into a single tau.trc file and a single tau.edf file.

% tau_treemerge.pl
/usr/global/tools/tau/training/tau-2.23.1/x86_64/bin/tau_merge -m tau.edf -e events.0.edf events.1.edf
events.2.edf events.3.edf events.4.edf events.5.edf events.6.edf events.7.edf tautrace.0.0.0.trc
tautrace.1.0.0.trc tautrace.2.0.0.trc tautrace.3.0.0.trc tautrace.4.0.0.trc tautrace.5.0.0.trc
tautrace.6.0.0.trc tautrace.7.0.0.trc tau.trc
tautrace.0.0.0.trc: 51511 records read.
tautrace.1.0.0.trc: 21897 records read.
tautrace.2.0.0.trc: 21897 records read.
tautrace.3.0.0.trc: 21897 records read.
tautrace.4.0.0.trc: 21099 records read.
tautrace.5.0.0.trc: 21113 records read.
tautrace.6.0.0.trc: 21099 records read.
tautrace.7.0.0.trc: 21099 records read. 

3. Convert the two merged TAU trace files into the Jumpshot slog2 format using the tau2slog2 utility:

% tau2slog2 tau.trc tau.edf -o matmult.slog2

4. Launch Jumpshot using the name of your slog2 file:

% jumpshot matmult.slog2

5. The Jumpshot main window will appear, allowing you to examine your application's trace events. A few representative screenshots are shown below (click for a larger image).

Compiling and Linking

As discussed in the Quick Start section above, TAU can be used to instrument applications without any special need to compile or link. However, using some TAU features does require compiling and linking with TAU components. For the most part, all of this is accomplished by using TAU Makefiles and compiler wrappers as described in steps 1 through 5 under Profiling with TAU Makefiles. Additional information can be found in the TAU Documentation.

Run-time Options

TAU has over 30 environment variables that can be used to control run-time behaviors such as:

  • Where to write profile/trace files
  • Which events/metrics to profile
  • Depth of call path to profile
  • Throttling
  • Verbosity/feedback
  • Sampling parameters
  • And more...

These are described in the TAU documentation at: https://www.cs.uoregon.edu/research/tau/docs/newguide/bk03apa.html.

Troubleshooting

  • TAU is a complex toolkit, and as such, troubleshooting problems may be difficult for the average user.
  • The most common problem is forgetting to load the TAU environment using the use tau command.
  • Most problems, if not easily resolved, should be reported to the LC Hotline.
  • These may be referred to the TAU development team under LC's support contract.

Documentation and References

The most important TAU links are listed below. Searching the web will find additional TAU documentation and presentations hosted by third parties.

LLNL-WEB-670397