Gprof is a performance analysis tool used to profile applications to determine where time is spent during program execution. Gprof is included with most Unix/Linux implementations, is simple to use, and can quickly show which parts of an application take the most time (hotspots). Gprof works by automatically instrumenting your code during compilation, and then sampling the application's program counter during execution. Sampling data is saved in a file, typically named gmon.out, which can then be read by the gprof command. Gprof was developed as an extension of the prof command, by providing a call graph of the program's execution.

Platforms and Locations

Platform Location
x86_64 Linux /usr/bin/gprof
CORAL /usr/bin/gprof

Quick Start

Using Gprof is a simple three or four step process, as shown below.

1. First, compile/link your application as you normally would, but be sure to include the -pg flag. Note that if you compile and link as separate steps in your application build, you will need to include -pg in both steps.

% gcc -O3 -pg -o myprog myprog.c 

2. For parallel applications only: if you want each parallel process to produce its own output file, you will need to set the undocumented environment variable GMON_OUT_PREFIX to some non-null string. For example:

% setenv GMON_OUT_PREFIX 'gmon.out'

3. Run your application as usual. The example below shows a 16 task MPI application running in the pdebug partition on a Linux cluster.

% srun -n16 -ppdebug myprog 

4. View the results: use the gprof command to convert the output file(s) into a human-readable report. Several examples are shown below:

  1. Serial
  2. Parallel - one process only
  3. Parallel - multiple processes
  4. Parallel - all processes
1% gprof myprog gmon.out
2% gprof myprog gmon.out.18302
3% gprof myprog gmon.out.18297 gmon.out.18300 gmon.out.9097
4% gprof myprog gmon.out.* 

Another option for parallel programs is to sum all output files into a single gmon.sum file which can then be viewed with grof. For example:

% gprof myprog -s gmon.out.*
% gprof myprog gmon.sum 

Output

Gprof's output consists of a human-readable report produced from the binary sampling file(s) created when the application was executed. By default the report is written to stdout, which can always be redirected to a file with a name of your choosing. For parallel programs, the report is a summation of the files used to produce it.

Gprof's report consists of three sections, each of which is discussed below. Note that there are a number of options available to customize Gprof's report - see the Gprof Documentation for details.

Flat Profile

The Flat Profile section shows the total amount of time your program spent executing each function, sorted by decreasing run-time. Unless the -z option is used, functions with no apparent time spent in them, and no apparent calls to them, are not mentioned. Note that if a function was not compiled for profiling, and didn't run long enough to show up on the program counter histogram, it will be indistinguishable from a function that was never called. An example Flat Profile section is shown below. Note that the report includes an explanation of what each column means.

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
time   seconds   seconds    calls   s/call   s/call  name
28.25      0.50     0.50     1031     0.00     0.00  advanceParticleGroup
11.30      0.70     0.20     1722     0.00     0.00  scatterParticleGroup
  7.34      0.83     0.13  1235596     0.00     0.00  contributeCurrent
  5.65      0.93     0.10     4127     0.00     0.00  indexSort
  3.67      1.00     0.07  2950376     0.00     0.00  getEcenterField
  3.39      1.06     0.06  9638242     0.00     0.00  getCell
  2.26      1.10     0.04   896527     0.00     0.00  matrixXmatrix
  1.98      1.13     0.04  3572325     0.00     0.00  getEcenter
...
[text deleted]
...
  0.00      1.77     0.00        1     0.00     0.00  setSurfaceCells
  0.00      1.77     0.00        1     0.00     0.00  setTemporalParameters
  0.00      1.77     0.00        1     0.00     0.00  setTimeStep
  0.00      1.77     0.00        1     0.00     0.00  setVeryThinConductingObjects
  0.00      1.77     0.00        1     0.00     0.00  setupMediumModels
  0.00      1.77     0.00        1     0.00     1.68  timeloop

%         the percentage of the total running time of the
time       program used by this function.

cumulative a running sum of the number of seconds accounted
seconds   for by this function and those listed above it.

self      the number of seconds accounted for by this
seconds    function alone.  This is the major sort for this
           listing.

calls      the number of times this function was invoked, if
           this function is profiled, else blank.

self      the average number of milliseconds spent in this
ms/call    function per call, if this function is profiled,
           else blank.

total     the average number of milliseconds spent in this
ms/call    function and its descendants per call, if this
           function is profiled, else blank.

name       the name of the function.  This is the minor sort
           for this listing. The index shows the location of
           the function in the gprof listing. If the index is
           in parenthesis it shows where it would appear in
           the gprof listing if it were to be printed.

Call Graph

granularity: each sample hit covers 2 byte(s) for 0.03% of 28.91 seconds
index % time    self  children    called     name
                0.00   16.73       1/1           .__scalbn [2]
[1]     57.9    0.00   16.73       1         .sphot [1]
               12.79    3.80       1/1           .execute [3]
                0.11    0.00       1/1           .copypriv [14]
                0.01    0.00       1/1           .genmesh [34]
                0.01    0.00       1/1           .copyglob [33]
                0.00    0.00       1/1           .seedranf [44]
                0.00    0.00       1/6184794     .ranf [5]
                0.00    0.00      10/13          .second [49]
                0.00    0.00       1/1           .allocdyn [51]
                0.00    0.00       1/1           .rdinput [57]
                0.00    0.00       1/1           .genxsec [52]
                0.00    0.00       1/1           .writeout [59]
-----------------------------------------------
                                                
[2]     57.9    0.00   16.73                 .__scalbn [2]
                0.00   16.73       1/1           .sphot [1]
-----------------------------------------------
               12.79    3.80       1/1           .sphot [1]
[3]     57.4   12.79    3.80       1         .execute [3]
                0.24    3.55 6184792/6184794     .ranf [5]
                0.02    0.00      24/24          .plnkut [24]
                0.00    0.00       1/1           .copyseed [46]
...
[text deleted]
...
-----------------------------------------------
                0.00    0.00       1/1           .sphot [1]
[57]     0.0    0.00    0.00       1         .rdinput [57]
                0.00    0.00       1/1           .rdopac [58]
-----------------------------------------------
                0.00    0.00       1/1           .rdinput [57]
[58]     0.0    0.00    0.00       1         .rdopac [58]
-----------------------------------------------
                0.00    0.00       1/1           .sphot [1]
[59]     0.0    0.00    0.00       1         .writeout [59]
                0.00    0.00      64/64          .wroutput [47]
                0.00    0.00       1/13          .second [49]
-----------------------------------------------
                0.00    0.00       1/1           .genmesh [34]
[60]     0.0    0.00    0.00       1         .zonevols [60]
-----------------------------------------------

Function Index

Index by function name

  [19] .(block data)_pinit    [28] ._atanf                 [6] .pranf
  [35] .EndIORWFmt            [41] ._cos                  [17] .pthread_mutex_lock
  [20] .FormatControl         [29] ._cosf                  [5] .ranf
  [36] .GetMyLocalContext     [16] ._log                  [45] .ranfatok
  [25] .GetUnit               [10] ._mcount               [55] .ranfk
  [37] .IOReadRealAndScan     [30] ._xlfBeginIO           [56] .ranfkbinary
  [38] .IOSetRecordOffset     [23] .aix_atof               [8] .ranfmodmult
  [26] .PAMI_Client_destroy   [51] .allocdyn              [43] .rans
  [39] .__brk                 [33] .copyglob              [57] .rdinput
  [11] .__cos                 [14] .copypriv              [58] .rdopac
  [27] .__fxstat64            [46] .copyseed              [49] .second
  [12] .__ieee754_lgamma_r     [3] .execute               [44] .seedranf
   [7] .__ieee754_log         [34] .genmesh                [1] .sphot
  [21] .__lgamma_r            [52] .genxsec               [59] .writeout
   [9] .__log                 [50] .interp                [47] .wroutput
  [40] .__log1p               [53] .iranfeven             [60] .zonevols
   [4] .__mcount_internal     [48] .iranfodd              [32] __close_nocancel
  [15] .__profile_frequency   [54] .nisodd                [42] __open_nocancel
  [22] .__pthread_mutex_unlock_usercnt [31] .omp_test_nest_lock [18] __read_nocancel
  [13] .__sin                 [24] .plnkut

Compiling and Linking

As discussed in the Quick Start section above, the-pg flag is required when compiling and linking. For applications where compiling and linking are separate steps, -pg must be included in both steps.

Run-time Operations

Gprof provides a number of options that allow you to:

  • Customize the output format and style
  • Specify how Gprof analyzes its data
  • Specify debugging/diagnostic output while Gprof performs its work

For details on these options, see the Gprof Documentation.

One important, usually undocumented Gprof run-time option is the GMON_OUT_PREFIX environment variable. It is used to produce a unique sampling data file for each process in a parallel application. See step 2 under the Quick Start section above.

Troubleshooting

  • Consult the GNU documentation (below)
  • Contact the LC Hotline to report a problem.

Documentation and References