Profiling using gprof

From HPC users
Revision as of 13:43, 20 September 2013 by Melchert (talk | contribs)
Jump to navigationJump to search

The GNU profiler gprof is a profiling tool which might be used to identify those parts of your program that are especially time consuming. It is thus a valuable tool that helps to invest your time to improve the right parts of your code in order to eventually decrease the running time of your application.

A profiling cycle consists of the following three steps:

  • compiling the program for profiling,
  • executing the program,
  • postprocessing the profiling raw data using gprof.

All three steps are subsequently illustrated by means of a small example program.

A basic example program

As a basic example program to illustrate the use of the gprof profiling tool, consider the following source code of a program written in C, here referred to as myExample.c:

 
#include <stdio.h>
#include <stdlib.h>

/* function prototypes */
int fib_rec(int);
int fib_dyn(int);

/* fib_rec()
 * Compute Fibonacci number fN for specified index N
 *
 * NOTE:
 * Recursive programming variant to compute fN
 * 
 * Input:
 * N    - Fibonacci index
 * 
 * Returns: (fN)
 * fN   - Fibonacci number for index N
 */
int fib_rec(int N){
  if(N<2){ 
    return N;
  }
  else{  
    return fib_rec(N-1)+fib_rec(N-2); 
  }
}

/* fib_dyn()
 * Compute Fibonacci number fN for specified index N
 *
 * NOTE:
 * Dynamic programming variant to compute fN
 * 
 * Input:
 * N    - Fibonacci index
 * 
 * Returns: (fN)
 * fN   - Fibonacci number for index N
 */
int fib_dyn(int N){
  int *myArray,i,fN;

  if(N<2){ 
    return N;
  }

  myArray=(int *) malloc((N+1)*sizeof(int));

  myArray[0]=0;
  myArray[1]=1;
  for(i=2;i<=N;i++){
    myArray[i]=myArray[i-1]+myArray[i-2];
  }
  fN=myArray[N];
  
  free(myArray);
  return fN;
}


int main(int argc, char *argv[]) {
  int N;

  N = atoi(argv[1]);

  printf("fib_rec(%d) = %d\n",N,fib_rec(N));
  printf("fib_dyn(%d) = %d\n",N,fib_dyn(N));
  return 0;
}
  

For clarity some documentation is also provided. The purpose of the code is quite simple: it compares two different ways to compute the Fibonacci number for a given index . In general, the sequence of Fibonacci numbers follows the recurrence relation started using and , see e.g. here. The above source code implements two different functions, following two different programming paradigms, to compute for a given input index : fib_rec, following a recursive programming approach, and, fib_dyn, following a dynamic programming approach. The main function defined within the source code simply calls both functions for a specified Fibonacci index and prints out the result. Since grows rather fast with increasing , the above implementation faces severe problems for large values of . However, even modest values of the Fibonacci index will do for the illustration of gprof.

Compiling for profiling

So as to benefit from gprof you need to compile and link (in case of several modules) your program so that it provides further information for profiling. Using gcc this is done by adding the compiler option -pg in addition to the options you usually use. For the above example you might thus type

gcc myExample.c -o myExample -pg

This then yields the executable myExample, compiled for profiling.

A basic profiling session

To retrieve the profiling information you first need to run the program. Here, calling the program for the Fibonacci index 30 yields the output

 
$ ./myExample 30
fib_rec(30) = 832040
fib_dyn(30) = 832040
  

and, most importantly, the file gmon.out. The latter file can be postprocessed via gprof to yield the profiling information. Therefore you simply need to type

 gprof ./myExample gmon.out

The details listed by gprof comprise the flat profile and the call graph.

Flat profile

To focus solely on the flat profile you can use the commandline option -p

 
$ gprof -p ./myExample gmon.out 
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
100.74      0.01     0.01        1    10.07    10.07  fib_rec
  0.00      0.01     0.00        1     0.00     0.00  fib_dyn

 %         the percentage of the total running time of the
time       program used by this function.

cumulative a running sum of the number of seconds accounted
 seconds   for by this function and those listed above it.

 self      the number of seconds accounted for by this
seconds    function alone.  This is the major sort for this
           listing.

calls      the number of times this function was invoked, if
           this function is profiled, else blank.
 
 self      the average number of milliseconds spent in this
ms/call    function per call, if this function is profiled,
           else blank.

 total     the average number of milliseconds spent in this
ms/call    function and its descendents per call, if this 
           function is profiled, else blank.

name       the name of the function.  This is the minor sort
           for this listing. The index shows the location of
           the function in the gprof listing. If the index is
           in parenthesis it shows where it would appear in
           the gprof listing if it were to be printed.
  

From the flat profile it should be evident that the program spends almost all of its time in the function fib_rec (I guess the 100.74% of the running time should be read as close to 100%). However, even for the time consuming part of this program, the absolute time listed is about 0.01 sec for the function calls to fib_rec which almost is too small for the time resolution used by gprof. However, note that it is hardly possible to improve the running time of the function fib_rec (here, even changing the function to an inline function will not help further).

Call graph

To focus on the call graph, only, you can use the commandline option -q. Below, the additional option -b was used to suppress lots of verbose statements:

 
$ gprof -q -b ./myExample gmon.out 
                        Call graph


granularity: each sample hit covers 2 byte(s) for 99.26% of 0.01 seconds

index % time    self  children    called     name
                             2692536             fib_rec [1]
                0.01    0.00       1/1           main [2]
[1]    100.0    0.01    0.00       1+2692536 fib_rec [1]
                             2692536             fib_rec [1]
-----------------------------------------------
                                                 <spontaneous>
[2]    100.0    0.00    0.01                 main [2]
                0.01    0.00       1/1           fib_rec [1]
                0.00    0.00       1/1           fib_dyn [3]
-----------------------------------------------
                0.00    0.00       1/1           main [2]
[3]      0.0    0.00    0.00       1         fib_dyn [3]
-----------------------------------------------


Index by function name

   [3] fib_dyn                 [1] fib_rec
  

All information needed to interpret the call graph are listed if the verbose statements are not suppressed by means of the commandline option -b.

As evident from the calling graph, the function fib_rec, listed as [1] is called 2692537 times during the computation of fib_rec(30). This can be seen from the column entitled called. For this quantity, gprof states that called is the number of times the function was called. If the function called itself recursively, the number only includes non-recursive calls, and is followed by a + and the number of recursive calls. This large number of recursive function calls explains why the majority of the running time is spent in that function. This reflects that during the recursive computation of a Fibonacci number, intermediate results are computed over and over again. In contrast to this, the function fib_dyn, listed as [3] is way more efficient in storing and reconsidering intermediate results. Hence its superior running time (which, however falls below the resolution of gprof).

For completeness lets have more detailed look at the first row:

 
index % time    self  children    called     name
                             2692536             fib_rec [1]
                0.01    0.00       1/1           main [2]
[1]    100.0    0.01    0.00       1+2692536 fib_rec [1]
                             2692536             fib_rec [1]
  

The first two entries refer to the parents of the function fib_rec (since the function called itself recursively, it is also a parent of itself). In the called column, the number of times the function returned to the respective parent function (specified in column name) is listed. So from the first entry it is evident that fib_rec returned 2692536 times to the parent function fib_rec. From the second line it can be seen that the function fib_rec returned 1 time to the parent function main (further detailed at index [2]). Entries three and four should be interpreted together. Entry three lists the number of times, the function fib_rec was called in the form non-recursive calls+recursive calls. Entry four, where the function name in column name is indented relative to the previous line, indicates that fib_rec has to be interpreted as a child function here. In this regard, the number listed in column called refers to the number of times, this child function was called.


Profiling while using shared libraries

When you link your code with a shared library, gprof can, unfortunately, not be used. Then you need to use the program sprof instead. sprof is a tool to read and display shared object profiling data.

Shared library example

Just for arguments, subdivide the example code listed above into three files, fibfunc.h:

 
#ifndef fib_h__
#define fib_h__

int fib_rec(int),
    fib_dyn(int);

#endif // fin_h__
  

fibfunc.c:

 
#include <stdlib.h>

int fib_rec(int N){
  if(N<2){ 
    return N;
  }
  else{  
    return fib_rec(N-1)+fib_rec(N-2); 
  }
}

int fib_dyn(int N){
  int *myArray,i,fN;

  if(N<2){ 
    return N;
  }

  myArray=(int *) malloc((N+1)*sizeof(int));

  myArray[0]=0;
  myArray[1]=1;
  for(i=2;i<=N;i++){
    myArray[i]=myArray[i-1]+myArray[i-2];
  }
  fN=myArray[N];
  
  free(myArray);
  return fN;
}
  

and myExample.c:

 
#include <stdio.h>
#include <stdlib.h>
#include "fibfuncs.h"


int main(int argc, char *argv[]) {
  int N;

  N = atoi(argv[1]);

  printf("fib_rec(%d) = %d\n",N,fib_rec(N));
  printf("fib_dyn(%d) = %d\n",N,fib_dyn(N));
  return 0;
}
  

Then, you might create a shared library called libfib.so by means of

 gcc -c -Wall -Werror -fpic fibfuncs.c -o fibfuncs.o
 gcc -shared -o libfib.so fibfuncs.o 
 gcc -Wall -o myExample myExample.c -L. -lfib 

Further, you need to amend the LD_LIBRARY_PATH to include the actual location of the shared object, e.g., by means of

 export LD_LIBRARY_PATH=/path-to-your-lib:$LD_LIBRARY_PATH

A basic profiling session

Then you are able to execute the binary myExample. So as to enable profiling information you also need to set the LD_PROFILE environmnt variable to point to the proper shared object. Here,

 export LD_PROFILE=libfib.so

is appropriate. Then, running the binary will create a profiling file in /var/tmp/, see

 
$ ./myExample 30
fib_rec(30) = 832040
fib_dyn(30) = 832040
$ ls /var/tmp/
libfib.so.profile
  

which can be postprocessed by calling sprof for the shared object further specifying the raw data file according to:

 
$ sprof libfib.so libfib.so.profile
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  us/call  us/call  name
  0.00      0.00     0.00  2692537     0.00           fib_rec
  0.00      0.00     0.00        1     0.00           fib_dyn

index % time    self  children    called     name

[0]      0.0    0.00    0.00        0         malloc@@GLIBC_2.2.5 [0]
                0.00    0.00        1/1           fib_dyn [2]
                0.00    0.00        1/2692537     fib_rec [1]
-----------------------------------------------
                0.00    0.00  2692536/2692537     fib_rec [1]
                0.00    0.00        1/2692537     <UNKNOWN>
[1]      0.0    0.00    0.00  2692537         fib_rec [1]
                0.00    0.00  2692536/2692537     fib_rec [1]
-----------------------------------------------
                0.00    0.00        1/1           <UNKNOWN>
[2]      0.0    0.00    0.00        1         fib_dyn [2]
-----------------------------------------------
  

As above, the postprocessed profiling information can be split up into the flat profile and the call graph. Both can be interprated similar to the output of gprof.