Difference between revisions of "Valgrind"

From HPC users
Jump to navigationJump to search
Line 11: Line 11:
The typical output of valgrind is very long and sometimes not easy to understand. For more informations see the  [http://valgrind.org/docs/manual/manual.html| user guide].
The typical output of valgrind is very long and sometimes not easy to understand. For more informations see the  [http://valgrind.org/docs/manual/manual.html| user guide].


For analyzing parallel MPI programs the ''mpirun''/''mpiexec'' statement including its options must precede the command line above. In this case it could be useful to write the command line above into a small shell script and pipe the output into a file with a suffix of the rank. For example
For analyzing parallel MPI programs the ''mpirun''/''mpiexec'' statement including its options must precede the command line above. In this case it could be useful to write the output into a file with a suffix of the rank. This can be done by the command line option ''--log-file'' and the substitution ''%q{}''of valgrind. For example valgrind can be started by


   #!/bin/bash
   mpirun -np <number_of_processes> ... valgrind -v --leak-check=full --log-file="valgrind.out.%q{PMI_RANK}" <mpiprogram> [program_options]
  # filename: my_valgrind_run.sh
  valgrind -v --leak-check=full <mpiprogram> [program_options] >& valgrind.out.$PMI_RANK


for Intel MPI. For OpenMPI the variable name ''PMI_RANK'' has to be replaced by ''OMPI_COMM_WORLD_RANK''. When launching the script with
for Intel MPI. Here the statement ''%q{PMI_RANK}'' is substituted by the value of environment variable ''PMI_RANK'' which is set by Intel MPI. For OpenMPI the variable name ''PMI_RANK'' has to be replaced by ''OMPI_COMM_WORLD_RANK''. The output is then redirected into the files valdgrind.out.[0...<number_of_processes>] for each process/rank. If for example the error occurs on rank 123 the related message can be found in the file ''valdgrind.out.123''.
 
  mpirun -np <number_of_processes> ...  my_valgrind_run.sh
 
the output is redirected into the files valdgrind.out.[0...<number_of_processes>] for each process. If for example the error occurs on rank 123 the message can be found in the file ''valdgrind.out.123''.


'''Note''':  
'''Note''':  

Revision as of 17:01, 11 December 2012

The program valgrind is a tool to detect memory leaks and violations in a program. It tracks the allocated memory and checks if it was freed. Additionally it checks if a programm writes outside its allocated memory or threads writes at the same time to the same memory.


Usage

To use valgrind the code should be compiled with debug symbols (-g option for most of the compilers) so that errors could be backtraced to the source code line. After compiling the program will be started with valgrind typically by

 valgrind -v --leack-check=full <program> [program_options] >& valgrind.out

The typical output of valgrind is very long and sometimes not easy to understand. For more informations see the user guide.

For analyzing parallel MPI programs the mpirun/mpiexec statement including its options must precede the command line above. In this case it could be useful to write the output into a file with a suffix of the rank. This can be done by the command line option --log-file and the substitution %q{}of valgrind. For example valgrind can be started by

 mpirun -np <number_of_processes> ... valgrind -v --leak-check=full --log-file="valgrind.out.%q{PMI_RANK}" <mpiprogram> [program_options]

for Intel MPI. Here the statement %q{PMI_RANK} is substituted by the value of environment variable PMI_RANK which is set by Intel MPI. For OpenMPI the variable name PMI_RANK has to be replaced by OMPI_COMM_WORLD_RANK. The output is then redirected into the files valdgrind.out.[0...<number_of_processes>] for each process/rank. If for example the error occurs on rank 123 the related message can be found in the file valdgrind.out.123.

Note:

  • By using valgrind the runtime of the tested program increases dramatically so that the tests should be done on small problems.
  • Sometimes error messages occurs due to compiler libraries (especially Intel compiler) which are not really an error.


Known issues

  • On FLOW valgrind has some problems with the Intel Compiler option -x...., e.g. -xHost. Please unset this compiler setting when testing with Intel compilers.
  • Binaries build with the Intel compilers shows a lot of error messages in some libraries due to uninitialized values (e.g. in command vsprintf). These messages comes from the optimized Intel libraries and can be ignored.


External links