Valgrind
The program valgrind is a tool to detect memory leaks and violations in a program. It tracks the allocated memory and checks if it was freed. Additionally it checks if a programm writes outside its allocated memory or threads writes at the same time to the same memory.
Usage
To use valgrind please load the actual release of valgind by
module load valgrind
To use valgrind the code should be compiled with debug symbols (-g option for most of the compilers) so that errors could be backtraced to the source code line. After compiling the program will be started with valgrind typically by
valgrind -v --leack-check=full <program> [program_options] >& valgrind.out
The typical output of valgrind is very long and sometimes not easy to understand. For more informations see the user guide.
For analyzing parallel MPI programs the mpirun/mpiexec statement including its options must precede the command line above. In this case it could be useful to write the output into a file with a suffix of the rank. This can be done by the command line option --log-file and the substitution %q{}of valgrind. For example valgrind can be started by
mpirun -np <number_of_processes> ... valgrind -v --leak-check=full --log-file="valgrind.out.%q{PMI_RANK}" <mpiprogram> [program_options]
for Intel MPI. Here the statement %q{PMI_RANK} is substituted by the value of environment variable PMI_RANK which is set by Intel MPI. For OpenMPI the variable name PMI_RANK has to be replaced by OMPI_COMM_WORLD_RANK. The output is then redirected into the files valdgrind.out.[0...<number_of_processes>] for each process/rank. If for example the error occurs on rank 123 the related message can be found in the file valdgrind.out.123.
Note:
- By using valgrind the runtime of the tested program increases dramatically so that the tests should be done on small problems.
- Sometimes error messages occurs due to compiler libraries (especially Intel compiler) which are not really an error.
Known issues
- Binaries build with the Intel compilers shows a lot of error messages in some libraries due to uninitialized values (e.g. in command vsprintf). These messages comes from the optimized Intel libraries and can be ignored.