Difference between revisions of "Profiling using gprof"
Line 1: | Line 1: | ||
=Profiling using <tt>gprof</tt>= | |||
The GNU profiler [http://sourceware.org/binutils/docs/gprof/ gprof] is a | The GNU profiler [http://sourceware.org/binutils/docs/gprof/ gprof] is a | ||
profiling tool which might be used to identify those parts of your program that | profiling tool which might be used to identify those parts of your program that | ||
Line 237: | Line 239: | ||
= Profiling while using shared libraries: <tt>sprof</tt> = | |||
When you link your code with a shared library, <tt>gprof</tt> can, unfortunately, not be used. Then you need to | When you link your code with a shared library, <tt>gprof</tt> can, unfortunately, not be used. Then you need to | ||
Line 315: | Line 317: | ||
=== A basic profiling session === | === A basic profiling session === | ||
Then you are able to execute the binary <tt>myExample</tt>. So as to enable profiling | Then you are able to execute the binary <tt>myExample</tt>. So as to enable profiling | ||
you also need to set the <tt>LD_PROFILE</tt> environmnt variable to point to the proper shared object. | you also need to set the <tt>LD_PROFILE</tt> environmnt variable to point to the proper shared object. | ||
Here, | Here, | ||
Line 356: | Line 358: | ||
As above, the postprocessed profiling information can be split up into the ''flat profile'' and | As above, the postprocessed profiling information can be split up into the ''flat profile'' and | ||
the ''call graph''. Both can be interprated similar to the output of <tt>gprof</tt>. | the ''call graph''. Both can be interprated similar to the output of <tt>gprof</tt>. | ||
=Profiling Python programs= | |||
If you use the Python programming language, then profiling your program is quite an easy task. | |||
As Python comes with ''batteries included'', you don't need a further program in order | |||
to profile your script. The Python module which enables you to profile your programs is | |||
called <tt>cProfile</tt>, see [http://docs.python.org/2/library/profile.html here]. | |||
The most direct way to obtain profiling information is then to simply call the interpreter with an | |||
additional command line option. However, if you would like to profile particular functions or specific | |||
bits of code, only, you need to invest a bit more effort. | |||
== A basic example program == | |||
In order to illustrate how to profile programs written in the Python programming language, | |||
consider the subsequent example program, here referred to as <tt>fib.py</tt>: | |||
<nowiki> | |||
import sys | |||
def fib_rec(N): | |||
'''Compute Fibonacci number fN for specified index N | |||
NOTE: | |||
Recursive programming variant to compute fN | |||
Input: | |||
N - Fibonacci index | |||
Returns: (fN) | |||
fN - Fibonacci number for index N | |||
''' | |||
if N<2: return N | |||
else: return fib_rec(N-1)+fib_rec(N-2) | |||
def fib_dyn(N): | |||
'''Compute Fibonacci number fN for specified index N | |||
NOTE: | |||
Dynamic programming variant to compute fN | |||
Input: | |||
N - Fibonacci index | |||
Returns: (fN) | |||
fN - Fibonacci number for index N | |||
''' | |||
if N<2: return N | |||
myArray =range(N+1) | |||
for i in range(2,N+1): | |||
myArray[i]=myArray[i-1]+myArray[i-2] | |||
return myArray[-1] | |||
if __name__=="__main__": | |||
N = int(sys.argv[1]) | |||
print "fib_rec(%d)=%d"%(N,fib_rec(N)) | |||
print "fib_dyn(%d)=%d"%(N,fib_dyn(N)) | |||
</nowiki> | |||
The above Python script is similar to the previous example program written in C: | |||
it compares two different ways to compute the Fibonacci number <math>F_N</math> | |||
for a given index <math>N</math>. Therefore the program implements two | |||
different functions: <tt>fib_rec</tt>, following a recursive programming | |||
approach, and, <tt>fib_dyn</tt>, following a dynamic programming approach. | |||
For clarity, some documentation is provided with the functions. The | |||
main function defined at the end of the source code simply calls both functions | |||
for a specified Fibonacci index and prints out the results. | |||
== Profiling entire programs == | |||
In order to profile the full program you just need | |||
to invoke it (here for the Fibonacci index 30) similar to | |||
python -m cProfile -s time fib.py 30 | |||
The additional command line option <tt>-m cProfile</tt> enables the <tt>cProfile</tt> | |||
profiling tool (which is included with the standard Python modules). The further | |||
option <tt>-s time</tt> sorts the profiling results according to the total | |||
time spent in the respective modules or functions (time spent in subfunctions is | |||
excluded; default sorting is according to the function name): | |||
<nowiki> | |||
$ python -m cProfile -s time fib.py 30 | |||
fib_rec(30)=832040 | |||
fib_dyn(30)=832040 | |||
2692542 function calls (6 primitive calls) in 1.092 seconds | |||
Ordered by: internal time | |||
ncalls tottime percall cumtime percall filename:lineno(function) | |||
2692537/1 1.092 0.000 1.092 1.092 fib.py:3(fib_rec) | |||
1 0.000 0.000 1.092 1.092 fib.py:1(<module>) | |||
1 0.000 0.000 0.000 0.000 fib.py:19(fib_dyn) | |||
2 0.000 0.000 0.000 0.000 {range} | |||
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} | |||
</nowiki> | |||
As evident from the profiling output, the dynamic programming variant outperforms the | |||
recursive procedure by far. Also, the number of calls to the function <tt>fib_rec</tt> amounts | |||
to 2692537. Out of these, one call was invoked non-recursively. | |||
For completeness, note that you might store the profiling (raw-)data in a file and later | |||
postprocess that data using the <tt>pstat</tt> module, see [http://docs.python.org/2/library/profile.html#the-stats-class here]. | |||
== Profiling particular functions == | |||
A convenient way to profile particular functions, only, is to write a further ''profiling script''. | |||
Within that script you might import the <tt>cProfile</tt> module directly and use, e.g., the | |||
<tt>run</tt> method implemented by that module. E.g., so as to profile the function <tt>fib_rec</tt> | |||
you might set up a script similar to the script called <tt>profileScript.py, listed below | |||
(you could do this via an interactive session as well, of course): | |||
<nowiki> | |||
import cProfile | |||
import fib | |||
cProfile.run("fib.fib_rec(30)") | |||
</nowiki> | |||
which then yields the output: | |||
<nowiki> | |||
$ python profileSkript.py | |||
2692539 function calls (3 primitive calls) in 1.080 seconds | |||
Ordered by: standard name | |||
ncalls tottime percall cumtime percall filename:lineno(function) | |||
1 0.000 0.000 1.080 1.080 <string>:1(<module>) | |||
2692537/1 1.080 0.000 1.080 1.080 fib.py:3(fib_rec) | |||
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} | |||
</nowiki> | |||
Note that you can specify an output file for the profiling data by changing the last line of <tt>profileScript.py</tt> | |||
to: <tt>cProfile.run("fib.fib_rec(30)","outFile.dat")</tt>. This will then yield the file <tt>outFile.dat</tt>, containing | |||
the profiling data, in the current working directory. | |||
== Profiling only parts of your code == | |||
If you want to profile only parts of a function or any other bit of code surrounded by further | |||
code you might proceed as follows: start a profiling session using the <tt>cProfile</tt> commands | |||
and directly disable that session after initializing it. Then, enable the profiling session only for | |||
the bits of code you want to profile. E.g., to profile only parts of the function <tt>fib_dyn</tt> | |||
you might assemble a script similar to <tt>profileScript2.py</tt>: | |||
<nowiki> | |||
import sys | |||
import cProfile | |||
prof = cProfile.Profile() | |||
prof.disable() | |||
def fib_dyn(N): | |||
if N<2: return N | |||
myArray =range(N+1) | |||
prof.enable() | |||
for i in range(2,N+1): | |||
myArray[i]=myArray[i-1]+myArray[i-2] | |||
prof.disable() | |||
return myArray[-1] | |||
def main(): | |||
N = int(sys.argv[1]) | |||
print "fib_dyn(%d)=%d"%(N,fib_dyn(N)) | |||
prof.print_stats() | |||
main() | |||
</nowiki> | |||
Invoking the script in the usual way then yields: | |||
<nowiki> | |||
$ python profileSkript2.py 30 | |||
fib_dyn(30)=832040 | |||
2 function calls in 0.000 seconds | |||
Ordered by: standard name | |||
ncalls tottime percall cumtime percall filename:lineno(function) | |||
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} | |||
1 0.000 0.000 0.000 0.000 {range} | |||
</nowiki> | |||
For this particular example, the running times falls below the resolution of the profiling tool, but | |||
the number of function calls can be interpreted well and the idea behind profiling bits of code | |||
using python should be clear. |
Revision as of 16:56, 24 September 2013
Profiling using gprof
The GNU profiler gprof is a profiling tool which might be used to identify those parts of your program that are especially time consuming. It is thus a valuable tool that helps to invest your time to improve the right parts of your code in order to eventually decrease the running time of your application.
A profiling cycle consists of the following three steps:
- compiling the program for profiling,
- executing the program,
- postprocessing the profiling raw data using gprof.
All three steps are subsequently illustrated by means of a small example program.
A basic example program
As a basic example program to illustrate the use of the gprof profiling tool, consider the following source code of a program written in C, here referred to as myExample.c:
#include <stdio.h> #include <stdlib.h> /* function prototypes */ int fib_rec(int); int fib_dyn(int); /* fib_rec() * Compute Fibonacci number fN for specified index N * * NOTE: * Recursive programming variant to compute fN * * Input: * N - Fibonacci index * * Returns: (fN) * fN - Fibonacci number for index N */ int fib_rec(int N){ if(N<2){ return N; } else{ return fib_rec(N-1)+fib_rec(N-2); } } /* fib_dyn() * Compute Fibonacci number fN for specified index N * * NOTE: * Dynamic programming variant to compute fN * * Input: * N - Fibonacci index * * Returns: (fN) * fN - Fibonacci number for index N */ int fib_dyn(int N){ int *myArray,i,fN; if(N<2){ return N; } myArray=(int *) malloc((N+1)*sizeof(int)); myArray[0]=0; myArray[1]=1; for(i=2;i<=N;i++){ myArray[i]=myArray[i-1]+myArray[i-2]; } fN=myArray[N]; free(myArray); return fN; } int main(int argc, char *argv[]) { int N; N = atoi(argv[1]); printf("fib_rec(%d) = %d\n",N,fib_rec(N)); printf("fib_dyn(%d) = %d\n",N,fib_dyn(N)); return 0; }
For clarity some documentation is also provided. The purpose of the code is quite simple:
it compares two different ways to compute the Fibonacci number for a given index
. In general, the sequence of Fibonacci numbers follows the recurrence relation
. The recurrence can be
started using and , see e.g. here.
The above source code implements two different functions, following two different programming paradigms, to compute for a given input index :
fib_rec
, following a recursive programming approach, and, fib_dyn
, following a dynamic programming approach.
The main function defined within the source code simply calls both functions for a specified Fibonacci index and prints out the result.
Since grows rather fast with increasing , the above implementation faces severe problems for large values of . However, even modest values of the Fibonacci index will do for the illustration of gprof.
Compiling for profiling
So as to benefit from gprof you need to compile and link (in case of several modules) your program so that it provides further information for profiling. Using gcc this is done by adding the compiler option -pg in addition to the options you usually use. For the above example you might thus type
gcc myExample.c -o myExample -pg
This then yields the executable myExample, compiled for profiling.
A basic profiling session
To retrieve the profiling information you first need to run the program. Here, calling the program for the Fibonacci index 30 yields the output
$ ./myExample 30 fib_rec(30) = 832040 fib_dyn(30) = 832040
and, most importantly, the file gmon.out. The latter file can be postprocessed via gprof to yield the profiling information. Therefore you simply need to type
gprof ./myExample gmon.out
The details listed by gprof comprise the flat profile and the call graph.
Flat profile
To focus solely on the flat profile you can use the commandline option -p
$ gprof -p ./myExample gmon.out Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ms/call ms/call name 100.74 0.01 0.01 1 10.07 10.07 fib_rec 0.00 0.01 0.00 1 0.00 0.00 fib_dyn % the percentage of the total running time of the time program used by this function. cumulative a running sum of the number of seconds accounted seconds for by this function and those listed above it. self the number of seconds accounted for by this seconds function alone. This is the major sort for this listing. calls the number of times this function was invoked, if this function is profiled, else blank. self the average number of milliseconds spent in this ms/call function per call, if this function is profiled, else blank. total the average number of milliseconds spent in this ms/call function and its descendents per call, if this function is profiled, else blank. name the name of the function. This is the minor sort for this listing. The index shows the location of the function in the gprof listing. If the index is in parenthesis it shows where it would appear in the gprof listing if it were to be printed.
From the flat profile it should be evident that the program spends almost all of its time in the function fib_rec (usually these numbers will NOT add up to 100%). However, even for the time consuming part of this program, the absolute time listed is about 0.01 sec for the function calls to fib_rec which almost is too small for the time resolution used by gprof. However, note that it is hardly possible to improve the running time of the function fib_rec (here, even changing the function to an inline function will not help further).
Call graph
To focus on the call graph, only, you can use the commandline option -q. Below, the additional option -b was used to suppress lots of verbose statements:
$ gprof -q -b ./myExample gmon.out Call graph granularity: each sample hit covers 2 byte(s) for 99.26% of 0.01 seconds index % time self children called name 2692536 fib_rec [1] 0.01 0.00 1/1 main [2] [1] 100.0 0.01 0.00 1+2692536 fib_rec [1] 2692536 fib_rec [1] ----------------------------------------------- <spontaneous> [2] 100.0 0.00 0.01 main [2] 0.01 0.00 1/1 fib_rec [1] 0.00 0.00 1/1 fib_dyn [3] ----------------------------------------------- 0.00 0.00 1/1 main [2] [3] 0.0 0.00 0.00 1 fib_dyn [3] ----------------------------------------------- Index by function name [3] fib_dyn [1] fib_rec
All information needed to interpret the call graph are listed if the verbose statements are not suppressed by means of the commandline option -b.
As evident from the calling graph, the function fib_rec, listed as [1] is called 2692537 times during the computation of fib_rec(30). This can be seen from the column entitled called. For this quantity, gprof states that called is the number of times the function was called. If the function called itself recursively, the number only includes non-recursive calls, and is followed by a + and the number of recursive calls. This large number of recursive function calls explains why the majority of the running time is spent in that function. This reflects that during the recursive computation of a Fibonacci number, intermediate results are computed over and over again. In contrast to this, the function fib_dyn, listed as [3] is way more efficient in storing and reconsidering intermediate results. Hence its superior running time (which, however falls below the resolution of gprof).
For completeness lets have more detailed look at the first row:
index % time self children called name 2692536 fib_rec [1] 0.01 0.00 1/1 main [2] [1] 100.0 0.01 0.00 1+2692536 fib_rec [1] 2692536 fib_rec [1]
The first two entries refer to the parents of the function fib_rec (since the function called itself recursively, it is also a parent of itself). In the called column, the number of times the function returned to the respective parent function (specified in column name) is listed. So from the first entry it is evident that fib_rec returned 2692536 times to the parent function fib_rec. From the second line it can be seen that the function fib_rec returned 1 time to the parent function main (further detailed at index [2]). Entries three and four should be interpreted together. Entry three lists the number of times, the function fib_rec was called in the form non-recursive calls+recursive calls. Entry four, where the function name in column name is indented relative to the previous line, indicates that fib_rec has to be interpreted as a child function here. In this regard, the number listed in column called refers to the number of times, this child function was called.
When you link your code with a shared library, gprof can, unfortunately, not be used. Then you need to use the program sprof instead. sprof is a tool to read and display shared object profiling data.
Just for arguments, subdivide the example code listed above into three files, fibfunc.h:
#ifndef fib_h__ #define fib_h__ int fib_rec(int), fib_dyn(int); #endif // fin_h__
fibfunc.c:
#include <stdlib.h> int fib_rec(int N){ if(N<2){ return N; } else{ return fib_rec(N-1)+fib_rec(N-2); } } int fib_dyn(int N){ int *myArray,i,fN; if(N<2){ return N; } myArray=(int *) malloc((N+1)*sizeof(int)); myArray[0]=0; myArray[1]=1; for(i=2;i<=N;i++){ myArray[i]=myArray[i-1]+myArray[i-2]; } fN=myArray[N]; free(myArray); return fN; }
and myExample.c:
#include <stdio.h> #include <stdlib.h> #include "fibfuncs.h" int main(int argc, char *argv[]) { int N; N = atoi(argv[1]); printf("fib_rec(%d) = %d\n",N,fib_rec(N)); printf("fib_dyn(%d) = %d\n",N,fib_dyn(N)); return 0; }
Then, you might create a shared library called libfib.so by means of
gcc -c -Wall -Werror -fpic fibfuncs.c -o fibfuncs.o gcc -shared -o libfib.so fibfuncs.o gcc -Wall -o myExample myExample.c -L. -lfib
Further, you need to amend the LD_LIBRARY_PATH to include the actual location of the shared object, e.g., by means of
export LD_LIBRARY_PATH=/path-to-your-lib:$LD_LIBRARY_PATH
A basic profiling session
Then you are able to execute the binary myExample. So as to enable profiling you also need to set the LD_PROFILE environmnt variable to point to the proper shared object. Here,
export LD_PROFILE=libfib.so
is appropriate. Then, running the binary will create a profiling file in /var/tmp/, see
$ ./myExample 30 fib_rec(30) = 832040 fib_dyn(30) = 832040 $ ls /var/tmp/ libfib.so.profile
which can be postprocessed by calling sprof for the shared object further specifying the raw data file according to:
$ sprof libfib.so libfib.so.profile Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls us/call us/call name 0.00 0.00 0.00 2692537 0.00 fib_rec 0.00 0.00 0.00 1 0.00 fib_dyn index % time self children called name [0] 0.0 0.00 0.00 0 malloc@@GLIBC_2.2.5 [0] 0.00 0.00 1/1 fib_dyn [2] 0.00 0.00 1/2692537 fib_rec [1] ----------------------------------------------- 0.00 0.00 2692536/2692537 fib_rec [1] 0.00 0.00 1/2692537 <UNKNOWN> [1] 0.0 0.00 0.00 2692537 fib_rec [1] 0.00 0.00 2692536/2692537 fib_rec [1] ----------------------------------------------- 0.00 0.00 1/1 <UNKNOWN> [2] 0.0 0.00 0.00 1 fib_dyn [2] -----------------------------------------------
As above, the postprocessed profiling information can be split up into the flat profile and the call graph. Both can be interprated similar to the output of gprof.
Profiling Python programs
If you use the Python programming language, then profiling your program is quite an easy task. As Python comes with batteries included, you don't need a further program in order to profile your script. The Python module which enables you to profile your programs is called cProfile, see here. The most direct way to obtain profiling information is then to simply call the interpreter with an additional command line option. However, if you would like to profile particular functions or specific bits of code, only, you need to invest a bit more effort.
A basic example program
In order to illustrate how to profile programs written in the Python programming language, consider the subsequent example program, here referred to as fib.py:
import sys def fib_rec(N): '''Compute Fibonacci number fN for specified index N NOTE: Recursive programming variant to compute fN Input: N - Fibonacci index Returns: (fN) fN - Fibonacci number for index N ''' if N<2: return N else: return fib_rec(N-1)+fib_rec(N-2) def fib_dyn(N): '''Compute Fibonacci number fN for specified index N NOTE: Dynamic programming variant to compute fN Input: N - Fibonacci index Returns: (fN) fN - Fibonacci number for index N ''' if N<2: return N myArray =range(N+1) for i in range(2,N+1): myArray[i]=myArray[i-1]+myArray[i-2] return myArray[-1] if __name__=="__main__": N = int(sys.argv[1]) print "fib_rec(%d)=%d"%(N,fib_rec(N)) print "fib_dyn(%d)=%d"%(N,fib_dyn(N))
The above Python script is similar to the previous example program written in C: it compares two different ways to compute the Fibonacci number for a given index . Therefore the program implements two different functions: fib_rec, following a recursive programming approach, and, fib_dyn, following a dynamic programming approach. For clarity, some documentation is provided with the functions. The main function defined at the end of the source code simply calls both functions for a specified Fibonacci index and prints out the results.
Profiling entire programs
In order to profile the full program you just need to invoke it (here for the Fibonacci index 30) similar to
python -m cProfile -s time fib.py 30
The additional command line option -m cProfile enables the cProfile profiling tool (which is included with the standard Python modules). The further option -s time sorts the profiling results according to the total time spent in the respective modules or functions (time spent in subfunctions is excluded; default sorting is according to the function name):
$ python -m cProfile -s time fib.py 30 fib_rec(30)=832040 fib_dyn(30)=832040 2692542 function calls (6 primitive calls) in 1.092 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 2692537/1 1.092 0.000 1.092 1.092 fib.py:3(fib_rec) 1 0.000 0.000 1.092 1.092 fib.py:1(<module>) 1 0.000 0.000 0.000 0.000 fib.py:19(fib_dyn) 2 0.000 0.000 0.000 0.000 {range} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
As evident from the profiling output, the dynamic programming variant outperforms the recursive procedure by far. Also, the number of calls to the function fib_rec amounts to 2692537. Out of these, one call was invoked non-recursively. For completeness, note that you might store the profiling (raw-)data in a file and later postprocess that data using the pstat module, see here.
Profiling particular functions
A convenient way to profile particular functions, only, is to write a further profiling script. Within that script you might import the cProfile module directly and use, e.g., the run method implemented by that module. E.g., so as to profile the function fib_rec you might set up a script similar to the script called profileScript.py, listed below (you could do this via an interactive session as well, of course):
import cProfile import fib cProfile.run("fib.fib_rec(30)")
which then yields the output:
$ python profileSkript.py 2692539 function calls (3 primitive calls) in 1.080 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 1.080 1.080 <string>:1(<module>) 2692537/1 1.080 0.000 1.080 1.080 fib.py:3(fib_rec) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Note that you can specify an output file for the profiling data by changing the last line of profileScript.py to: cProfile.run("fib.fib_rec(30)","outFile.dat"). This will then yield the file outFile.dat, containing the profiling data, in the current working directory.
Profiling only parts of your code
If you want to profile only parts of a function or any other bit of code surrounded by further code you might proceed as follows: start a profiling session using the cProfile commands and directly disable that session after initializing it. Then, enable the profiling session only for the bits of code you want to profile. E.g., to profile only parts of the function fib_dyn you might assemble a script similar to profileScript2.py:
import sys import cProfile prof = cProfile.Profile() prof.disable() def fib_dyn(N): if N<2: return N myArray =range(N+1) prof.enable() for i in range(2,N+1): myArray[i]=myArray[i-1]+myArray[i-2] prof.disable() return myArray[-1] def main(): N = int(sys.argv[1]) print "fib_dyn(%d)=%d"%(N,fib_dyn(N)) prof.print_stats() main()
Invoking the script in the usual way then yields:
$ python profileSkript2.py 30 fib_dyn(30)=832040 2 function calls in 0.000 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 1 0.000 0.000 0.000 0.000 {range}
For this particular example, the running times falls below the resolution of the profiling tool, but the number of function calls can be interpreted well and the idea behind profiling bits of code using python should be clear.