Gprof is a performance analysis tool used to profile applications to determine where time is spent during program execution. Gprof is included with most Unix/Linux implementations, is simple to use, and can quickly show which parts of an application take the most time (hotspots). Gprof works by automatically instrumenting your code during compilation, and then sampling the application's program counter during execution. Sampling data is saved in a file, typically named gmon.out, which can then be read by the gprof command. Gprof was developed as an extension of the prof command, by providing a call graph of the program's execution.
Platforms and Locations
Platform | Location |
---|---|
x86_64 Linux | /usr/bin/gprof |
CORAL | /usr/bin/gprof |
Quick Start
Using Gprof is a simple three or four step process, as shown below.
1. First, compile/link your application as you normally would, but be sure to include the -pg flag. Note that if you compile and link as separate steps in your application build, you will need to include -pg in both steps.
% gcc -O3 -pg -o myprog myprog.c
2. For parallel applications only: if you want each parallel process to produce its own output file, you will need to set the undocumented environment variable GMON_OUT_PREFIX to some non-null string. For example:
% setenv GMON_OUT_PREFIX 'gmon.out'
3. Run your application as usual. The example below shows a 16 task MPI application running in the pdebug partition on a Linux cluster.
% srun -n16 -ppdebug myprog
4. View the results: use the gprof command to convert the output file(s) into a human-readable report. Several examples are shown below:
- Serial
- Parallel - one process only
- Parallel - multiple processes
- Parallel - all processes
1% gprof myprog gmon.out 2% gprof myprog gmon.out.18302 3% gprof myprog gmon.out.18297 gmon.out.18300 gmon.out.9097 4% gprof myprog gmon.out.*
Another option for parallel programs is to sum all output files into a single gmon.sum file which can then be viewed with grof. For example:
% gprof myprog -s gmon.out.* % gprof myprog gmon.sum
Output
Gprof's output consists of a human-readable report produced from the binary sampling file(s) created when the application was executed. By default the report is written to stdout, which can always be redirected to a file with a name of your choosing. For parallel programs, the report is a summation of the files used to produce it.
Gprof's report consists of three sections, each of which is discussed below. Note that there are a number of options available to customize Gprof's report - see the Gprof Documentation for details.
Flat Profile
The Flat Profile section shows the total amount of time your program spent executing each function, sorted by decreasing run-time. Unless the -z option is used, functions with no apparent time spent in them, and no apparent calls to them, are not mentioned. Note that if a function was not compiled for profiling, and didn't run long enough to show up on the program counter histogram, it will be indistinguishable from a function that was never called. An example Flat Profile section is shown below. Note that the report includes an explanation of what each column means.
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 28.25 0.50 0.50 1031 0.00 0.00 advanceParticleGroup 11.30 0.70 0.20 1722 0.00 0.00 scatterParticleGroup 7.34 0.83 0.13 1235596 0.00 0.00 contributeCurrent 5.65 0.93 0.10 4127 0.00 0.00 indexSort 3.67 1.00 0.07 2950376 0.00 0.00 getEcenterField 3.39 1.06 0.06 9638242 0.00 0.00 getCell 2.26 1.10 0.04 896527 0.00 0.00 matrixXmatrix 1.98 1.13 0.04 3572325 0.00 0.00 getEcenter ... [text deleted] ... 0.00 1.77 0.00 1 0.00 0.00 setSurfaceCells 0.00 1.77 0.00 1 0.00 0.00 setTemporalParameters 0.00 1.77 0.00 1 0.00 0.00 setTimeStep 0.00 1.77 0.00 1 0.00 0.00 setVeryThinConductingObjects 0.00 1.77 0.00 1 0.00 0.00 setupMediumModels 0.00 1.77 0.00 1 0.00 1.68 timeloop % the percentage of the total running time of the time program used by this function. cumulative a running sum of the number of seconds accounted seconds for by this function and those listed above it. self the number of seconds accounted for by this seconds function alone. This is the major sort for this listing. calls the number of times this function was invoked, if this function is profiled, else blank. self the average number of milliseconds spent in this ms/call function per call, if this function is profiled, else blank. total the average number of milliseconds spent in this ms/call function and its descendants per call, if this function is profiled, else blank. name the name of the function. This is the minor sort for this listing. The index shows the location of the function in the gprof listing. If the index is in parenthesis it shows where it would appear in the gprof listing if it were to be printed.
Call Graph
granularity: each sample hit covers 2 byte(s) for 0.03% of 28.91 seconds index % time self children called name 0.00 16.73 1/1 .__scalbn [2] [1] 57.9 0.00 16.73 1 .sphot [1] 12.79 3.80 1/1 .execute [3] 0.11 0.00 1/1 .copypriv [14] 0.01 0.00 1/1 .genmesh [34] 0.01 0.00 1/1 .copyglob [33] 0.00 0.00 1/1 .seedranf [44] 0.00 0.00 1/6184794 .ranf [5] 0.00 0.00 10/13 .second [49] 0.00 0.00 1/1 .allocdyn [51] 0.00 0.00 1/1 .rdinput [57] 0.00 0.00 1/1 .genxsec [52] 0.00 0.00 1/1 .writeout [59] ----------------------------------------------- [2] 57.9 0.00 16.73 .__scalbn [2] 0.00 16.73 1/1 .sphot [1] ----------------------------------------------- 12.79 3.80 1/1 .sphot [1] [3] 57.4 12.79 3.80 1 .execute [3] 0.24 3.55 6184792/6184794 .ranf [5] 0.02 0.00 24/24 .plnkut [24] 0.00 0.00 1/1 .copyseed [46] ... [text deleted] ... ----------------------------------------------- 0.00 0.00 1/1 .sphot [1] [57] 0.0 0.00 0.00 1 .rdinput [57] 0.00 0.00 1/1 .rdopac [58] ----------------------------------------------- 0.00 0.00 1/1 .rdinput [57] [58] 0.0 0.00 0.00 1 .rdopac [58] ----------------------------------------------- 0.00 0.00 1/1 .sphot [1] [59] 0.0 0.00 0.00 1 .writeout [59] 0.00 0.00 64/64 .wroutput [47] 0.00 0.00 1/13 .second [49] ----------------------------------------------- 0.00 0.00 1/1 .genmesh [34] [60] 0.0 0.00 0.00 1 .zonevols [60] -----------------------------------------------
Function Index
Index by function name [19] .(block data)_pinit [28] ._atanf [6] .pranf [35] .EndIORWFmt [41] ._cos [17] .pthread_mutex_lock [20] .FormatControl [29] ._cosf [5] .ranf [36] .GetMyLocalContext [16] ._log [45] .ranfatok [25] .GetUnit [10] ._mcount [55] .ranfk [37] .IOReadRealAndScan [30] ._xlfBeginIO [56] .ranfkbinary [38] .IOSetRecordOffset [23] .aix_atof [8] .ranfmodmult [26] .PAMI_Client_destroy [51] .allocdyn [43] .rans [39] .__brk [33] .copyglob [57] .rdinput [11] .__cos [14] .copypriv [58] .rdopac [27] .__fxstat64 [46] .copyseed [49] .second [12] .__ieee754_lgamma_r [3] .execute [44] .seedranf [7] .__ieee754_log [34] .genmesh [1] .sphot [21] .__lgamma_r [52] .genxsec [59] .writeout [9] .__log [50] .interp [47] .wroutput [40] .__log1p [53] .iranfeven [60] .zonevols [4] .__mcount_internal [48] .iranfodd [32] __close_nocancel [15] .__profile_frequency [54] .nisodd [42] __open_nocancel [22] .__pthread_mutex_unlock_usercnt [31] .omp_test_nest_lock [18] __read_nocancel [13] .__sin [24] .plnkut
Compiling and Linking
As discussed in the Quick Start section above, the-pg flag is required when compiling and linking. For applications where compiling and linking are separate steps, -pg must be included in both steps.
Run-time Operations
Gprof provides a number of options that allow you to:
- Customize the output format and style
- Specify how Gprof analyzes its data
- Specify debugging/diagnostic output while Gprof performs its work
For details on these options, see the Gprof Documentation.
One important, usually undocumented Gprof run-time option is the GMON_OUT_PREFIX environment variable. It is used to produce a unique sampling data file for each process in a parallel application. See step 2 under the Quick Start section above.
Troubleshooting
- Consult the GNU documentation (below)
- Contact the LC Hotline to report a problem.
Documentation and References
- GNU Gprof documentation
- Gprof man page
- Searching the Web will turn up Gprof tutorials and more.