Profiling using gprof

From HPC users
Jump to navigationJump to search

Profiling using gprof

The GNU profiler gprof is a profiling tool which might be used to identify those parts of your program that are especially time consuming. It is thus a valuable tool that helps to invest your time to improve the right parts of your code in order to eventually decrease the running time of your application.

A profiling cycle consists of the following three steps:

  • compiling the program for profiling,
  • executing the program,
  • postprocessing the profiling raw data using gprof.

All three steps are subsequently illustrated by means of a small example program.

A basic example program

As a basic example program to illustrate the use of the gprof profiling tool, consider the following source code of a program written in C, here referred to as myExample.c:

 
#include <stdio.h>
#include <stdlib.h>

/* function prototypes */
int fib_rec(int);
int fib_dyn(int);

/* fib_rec()
 * Compute Fibonacci number fN for specified index N
 *
 * NOTE:
 * Recursive programming variant to compute fN
 * 
 * Input:
 * N    - Fibonacci index
 * 
 * Returns: (fN)
 * fN   - Fibonacci number for index N
 */
int fib_rec(int N){
  if(N<2){ 
    return N;
  }
  else{  
    return fib_rec(N-1)+fib_rec(N-2); 
  }
}

/* fib_dyn()
 * Compute Fibonacci number fN for specified index N
 *
 * NOTE:
 * Dynamic programming variant to compute fN
 * 
 * Input:
 * N    - Fibonacci index
 * 
 * Returns: (fN)
 * fN   - Fibonacci number for index N
 */
int fib_dyn(int N){
  int *myArray,i,fN;

  if(N<2){ 
    return N;
  }

  myArray=(int *) malloc((N+1)*sizeof(int));

  myArray[0]=0;
  myArray[1]=1;
  for(i=2;i<=N;i++){
    myArray[i]=myArray[i-1]+myArray[i-2];
  }
  fN=myArray[N];
  
  free(myArray);
  return fN;
}


int main(int argc, char *argv[]) {
  int N;

  N = atoi(argv[1]);

  printf("fib_rec(%d) = %d\n",N,fib_rec(N));
  printf("fib_dyn(%d) = %d\n",N,fib_dyn(N));
  return 0;
}
  

For clarity some documentation is also provided. The purpose of the code is quite simple: it compares two different ways to compute the Fibonacci number for a given index . In general, the sequence of Fibonacci numbers follows the recurrence relation . The recurrence can be started using and , see e.g. here. The above source code implements two different functions, following two different programming paradigms, to compute for a given input index : fib_rec, following a recursive programming approach, and, fib_dyn, following a dynamic programming approach. The main function defined within the source code simply calls both functions for a specified Fibonacci index and prints out the result. Since grows rather fast with increasing , the above implementation faces severe problems for large values of . However, even modest values of the Fibonacci index will do for the illustration of gprof.

Compiling for profiling

So as to benefit from gprof you need to compile and link (in case of several modules) your program so that it provides further information for profiling. Using gcc this is done by adding the compiler option -pg in addition to the options you usually use. For the above example you might thus type

gcc myExample.c -o myExample -pg

This then yields the executable myExample, compiled for profiling.

A basic profiling session

To retrieve the profiling information you first need to run the program. Here, calling the program for the Fibonacci index 30 yields the output

 
$ ./myExample 30
fib_rec(30) = 832040
fib_dyn(30) = 832040
  

and, most importantly, the file gmon.out. The latter file can be postprocessed via gprof to yield the profiling information. Therefore you simply need to type

 gprof ./myExample gmon.out

The details listed by gprof comprise the flat profile and the call graph.

Flat profile

To focus solely on the flat profile you can use the commandline option -p

 
$ gprof -p ./myExample gmon.out 
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
100.74      0.01     0.01        1    10.07    10.07  fib_rec
  0.00      0.01     0.00        1     0.00     0.00  fib_dyn

 %         the percentage of the total running time of the
time       program used by this function.

cumulative a running sum of the number of seconds accounted
 seconds   for by this function and those listed above it.

 self      the number of seconds accounted for by this
seconds    function alone.  This is the major sort for this
           listing.

calls      the number of times this function was invoked, if
           this function is profiled, else blank.
 
 self      the average number of milliseconds spent in this
ms/call    function per call, if this function is profiled,
           else blank.

 total     the average number of milliseconds spent in this
ms/call    function and its descendents per call, if this 
           function is profiled, else blank.

name       the name of the function.  This is the minor sort
           for this listing. The index shows the location of
           the function in the gprof listing. If the index is
           in parenthesis it shows where it would appear in
           the gprof listing if it were to be printed.
  

From the flat profile it should be evident that the program spends almost all of its time in the function fib_rec (usually these numbers will NOT add up to 100%). However, even for the time consuming part of this program, the absolute time listed is about 0.01 sec for the function calls to fib_rec which almost is too small for the time resolution used by gprof. However, note that it is hardly possible to improve the running time of the function fib_rec (here, even changing the function to an inline function will not help further).

Call graph

To focus on the call graph, only, you can use the commandline option -q. Below, the additional option -b was used to suppress lots of verbose statements:

 
$ gprof -q -b ./myExample gmon.out 
                        Call graph


granularity: each sample hit covers 2 byte(s) for 99.26% of 0.01 seconds

index % time    self  children    called     name
                             2692536             fib_rec [1]
                0.01    0.00       1/1           main [2]
[1]    100.0    0.01    0.00       1+2692536 fib_rec [1]
                             2692536             fib_rec [1]
-----------------------------------------------
                                                 <spontaneous>
[2]    100.0    0.00    0.01                 main [2]
                0.01    0.00       1/1           fib_rec [1]
                0.00    0.00       1/1           fib_dyn [3]
-----------------------------------------------
                0.00    0.00       1/1           main [2]
[3]      0.0    0.00    0.00       1         fib_dyn [3]
-----------------------------------------------


Index by function name

   [3] fib_dyn                 [1] fib_rec
  

All information needed to interpret the call graph are listed if the verbose statements are not suppressed by means of the commandline option -b.

As evident from the calling graph, the function fib_rec, listed as [1] is called 2692537 times during the computation of fib_rec(30). This can be seen from the column entitled called. For this quantity, gprof states that called is the number of times the function was called. If the function called itself recursively, the number only includes non-recursive calls, and is followed by a + and the number of recursive calls. This large number of recursive function calls explains why the majority of the running time is spent in that function. This reflects that during the recursive computation of a Fibonacci number, intermediate results are computed over and over again. In contrast to this, the function fib_dyn, listed as [3] is way more efficient in storing and reconsidering intermediate results. Hence its superior running time (which, however falls below the resolution of gprof).

For completeness lets have more detailed look at the first row:

 
index % time    self  children    called     name
                             2692536             fib_rec [1]
                0.01    0.00       1/1           main [2]
[1]    100.0    0.01    0.00       1+2692536 fib_rec [1]
                             2692536             fib_rec [1]
  

The first two entries refer to the parents of the function fib_rec (since the function called itself recursively, it is also a parent of itself). In the called column, the number of times the function returned to the respective parent function (specified in column name) is listed. So from the first entry it is evident that fib_rec returned 2692536 times to the parent function fib_rec. From the second line it can be seen that the function fib_rec returned 1 time to the parent function main (further detailed at index [2]). Entries three and four should be interpreted together. Entry three lists the number of times, the function fib_rec was called in the form non-recursive calls+recursive calls. Entry four, where the function name in column name is indented relative to the previous line, indicates that fib_rec has to be interpreted as a child function here. In this regard, the number listed in column called refers to the number of times, this child function was called.


Profiling while using shared libraries: sprof

When you link your code with a shared library, gprof can, unfortunately, not be used. Then you need to use the program sprof instead. sprof is a tool to read and display shared object profiling data.

Shared library example

Just for arguments, subdivide the example code listed above into three files, fibfunc.h:

 
#ifndef fib_h__
#define fib_h__

int fib_rec(int),
    fib_dyn(int);

#endif // fin_h__
  

fibfunc.c:

 
#include <stdlib.h>

int fib_rec(int N){
  if(N<2){ 
    return N;
  }
  else{  
    return fib_rec(N-1)+fib_rec(N-2); 
  }
}

int fib_dyn(int N){
  int *myArray,i,fN;

  if(N<2){ 
    return N;
  }

  myArray=(int *) malloc((N+1)*sizeof(int));

  myArray[0]=0;
  myArray[1]=1;
  for(i=2;i<=N;i++){
    myArray[i]=myArray[i-1]+myArray[i-2];
  }
  fN=myArray[N];
  
  free(myArray);
  return fN;
}
  

and myExample.c:

 
#include <stdio.h>
#include <stdlib.h>
#include "fibfuncs.h"


int main(int argc, char *argv[]) {
  int N;

  N = atoi(argv[1]);

  printf("fib_rec(%d) = %d\n",N,fib_rec(N));
  printf("fib_dyn(%d) = %d\n",N,fib_dyn(N));
  return 0;
}
  

Then, you might create a shared library called libfib.so by means of

 gcc -c -Wall -Werror -fpic fibfuncs.c -o fibfuncs.o
 gcc -shared -o libfib.so fibfuncs.o 
 gcc -Wall -o myExample myExample.c -L. -lfib 

Further, you need to amend the LD_LIBRARY_PATH to include the actual location of the shared object, e.g., by means of

 export LD_LIBRARY_PATH=/path-to-your-lib:$LD_LIBRARY_PATH

A basic profiling session

Then you are able to execute the binary myExample. So as to enable profiling you also need to set the LD_PROFILE environmnt variable to point to the proper shared object. Here,

 export LD_PROFILE=libfib.so

is appropriate. Then, running the binary will create a profiling file in /var/tmp/, see

 
$ ./myExample 30
fib_rec(30) = 832040
fib_dyn(30) = 832040
$ ls /var/tmp/
libfib.so.profile
  

which can be postprocessed by calling sprof for the shared object further specifying the raw data file according to:

 
$ sprof libfib.so libfib.so.profile
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  us/call  us/call  name
  0.00      0.00     0.00  2692537     0.00           fib_rec
  0.00      0.00     0.00        1     0.00           fib_dyn

index % time    self  children    called     name

[0]      0.0    0.00    0.00        0         malloc@@GLIBC_2.2.5 [0]
                0.00    0.00        1/1           fib_dyn [2]
                0.00    0.00        1/2692537     fib_rec [1]
-----------------------------------------------
                0.00    0.00  2692536/2692537     fib_rec [1]
                0.00    0.00        1/2692537     <UNKNOWN>
[1]      0.0    0.00    0.00  2692537         fib_rec [1]
                0.00    0.00  2692536/2692537     fib_rec [1]
-----------------------------------------------
                0.00    0.00        1/1           <UNKNOWN>
[2]      0.0    0.00    0.00        1         fib_dyn [2]
-----------------------------------------------
  

As above, the postprocessed profiling information can be split up into the flat profile and the call graph. Both can be interprated similar to the output of gprof.


Profiling Python programs

If you use the Python programming language, then profiling your program is quite an easy task. As Python comes with batteries included, you don't need a further program in order to profile your script. The Python module which enables you to profile your programs is called cProfile, see here. The most direct way to obtain profiling information is then to simply call the interpreter with an additional command line option. However, if you would like to profile particular functions or specific bits of code, only, you need to invest a bit more effort.

A basic example program

In order to illustrate how to profile programs written in the Python programming language, consider the subsequent example program, here referred to as fib.py:

 
import sys

def fib_rec(N):
        '''Compute Fibonacci number fN for specified index N

        NOTE:
        Recursive programming variant to compute fN

        Input:
        N    - Fibonacci index

        Returns: (fN)
        fN   - Fibonacci number for index N
        '''
        if N<2: return N
        else: return fib_rec(N-1)+fib_rec(N-2)


def fib_dyn(N):
        '''Compute Fibonacci number fN for specified index N

        NOTE:
        Dynamic programming variant to compute fN

        Input:
        N    - Fibonacci index

        Returns: (fN)
        fN   - Fibonacci number for index N
        '''
        if N<2: return N
        myArray =range(N+1)
        for i in range(2,N+1):
                myArray[i]=myArray[i-1]+myArray[i-2]
        return myArray[-1]


if __name__=="__main__":
        N = int(sys.argv[1])
        print "fib_rec(%d)=%d"%(N,fib_rec(N))
        print "fib_dyn(%d)=%d"%(N,fib_dyn(N))
  

The above Python script is similar to the previous example program written in C: it compares two different ways to compute the Fibonacci number for a given index . Therefore the program implements two different functions: fib_rec, following a recursive programming approach, and, fib_dyn, following a dynamic programming approach. For clarity, some documentation is provided with the functions. The main function defined at the end of the source code simply calls both functions for a specified Fibonacci index and prints out the results.

Profiling entire programs

In order to profile the full program you just need to invoke it (here for the Fibonacci index 30) similar to

 python -m cProfile -s time fib.py 30

The additional command line option -m cProfile enables the cProfile profiling tool (which is included with the standard Python modules). The further option -s time sorts the profiling results according to the total time spent in the respective modules or functions (time spent in subfunctions is excluded; default sorting is according to the function name):

 
$ python -m cProfile -s time fib.py 30
fib_rec(30)=832040
fib_dyn(30)=832040
         2692542 function calls (6 primitive calls) in 1.092 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
2692537/1    1.092    0.000    1.092    1.092 fib.py:3(fib_rec)
        1    0.000    0.000    1.092    1.092 fib.py:1(<module>)
        1    0.000    0.000    0.000    0.000 fib.py:19(fib_dyn)
        2    0.000    0.000    0.000    0.000 {range}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  

As evident from the profiling output, the dynamic programming variant outperforms the recursive procedure by far. Also, the number of calls to the function fib_rec amounts to 2692537. Out of these, one call was invoked non-recursively. For completeness, note that you might store the profiling (raw-)data in a file and later postprocess that data using the pstat module, see here. This is especially useful if the execution of the program results in a lot of profiling data.

Profiling particular functions

A convenient way to profile particular functions, only, is to write a further profiling script. Within that script you might import the cProfile module directly and use, e.g., the run method implemented by that module. E.g., so as to profile the function fib_rec you might set up a script similar to the script called profileScript.py, listed below (you could do this via an interactive session as well, of course):

 
import cProfile
import fib

cProfile.run("fib.fib_rec(30)")
  

which then yields the output:

 
$ python profileSkript.py
         2692539 function calls (3 primitive calls) in 1.080 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.080    1.080 <string>:1(<module>)
2692537/1    1.080    0.000    1.080    1.080 fib.py:3(fib_rec)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
  

Note that you can specify an output file for the profiling data by changing the last line of profileScript.py to: cProfile.run("fib.fib_rec(30)","outFile.dat"). This will then yield the file outFile.dat, containing the profiling data, in the current working directory.

Profiling only parts of your code

If you want to profile only parts of a function or any other bit of code surrounded by further code you might proceed as follows: start a profiling session using the cProfile commands and directly disable that session after initializing it. Then, enable the profiling session only for the bits of code you want to profile. E.g., to profile only parts of the function fib_dyn you might assemble a script similar to profileScript2.py:

 
import sys
import cProfile

prof = cProfile.Profile()
prof.disable()

def fib_dyn(N):
        if N<2: return N
        myArray =range(N+1)
        prof.enable()
        for i in range(2,N+1):
                myArray[i]=myArray[i-1]+myArray[i-2]
        prof.disable()
        return myArray[-1]

def main():
        N = int(sys.argv[1])
        print "fib_dyn(%d)=%d"%(N,fib_dyn(N))
        prof.print_stats()

main()
  

Invoking the script in the usual way then yields:

 
$ python profileSkript2.py 30
fib_dyn(30)=832040
         2 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {range}
  

For this particular example, the running times falls below the resolution of the profiling tool, but the number of function calls can be interpreted well and the idea behind profiling bits of code using python should be clear.