Debugging

From HPC users
Jump to navigationJump to search

For debugging several debuggers available for the compiler packages Intel Cluster Studio and PGI Compiler. Additionally the graphical frontend ddd for the GNU debugger gdb is available as module.


Debugging using the GNU debugger

The GNU Project debugger (GDB) is a valuable tool that might help you to analyze what caused your program to crash, i.e. it assists you in the process of debugging. By means of GDB you can basically execute your program in a line-by-line manner, stop at specified positions and display or alter the state of all the variables therein. GDB is able to debug programs written in many compiled languages such as C and C++. Below we will consider a basic (malfunctioning) program written in C and see how GDB can be used to discover where the problem is located. This example is meant to illustrate several features of GDB

A basic malfunctioning example program

So as to illustrate how to use GDB in order to detect bugs in a program, consider the following source code for a malfunctioning program written in C (with annotated line numbers), here referred to as myExample.c:

 
  1 #include <stdio.h>
  2 #include <stdlib.h>
  3 
  4 
  5 void listArray(int *, int);
  6 
  7 
  8 void listArray(int *myArray, int N){
  9   int i;
 10 
 11   printf("array = [");
 12   for(i=0; i<N; i++){
 13     printf("%d ",myArray[i]);
 14   }
 15   printf("]\n");
 16 
 17 }
 18 
 19 
 20 int main(int argc, char *argv[]) {
 21   int *myArray,N,i;
 22 
 23   N=10;
 24   myArray = (int *) malloc(-N*sizeof(int));
 25 
 26   for(i=0; i<N; i++){
 27     myArray[i]=i;
 28   }
 29 
 30   free(myArray);
 31   return 0;
 32 }
  

Besides the main function, which simply allocates an array and initializes its entries, the program implements one additional function called listArray, which lists a specified number of leading array entries.

Compiling for elaborate debugging information

So as to benefit from GDB you need to compile your program so that it provides further debugging information. Using gcc this is done by adding the compiler option -g. For the above example you might thus type

gcc myExample.c -o myExample -g

This then yields the executable myExample, compiled using further debugging symbols.

A debugging session using GDB

If you invoke the program in a straight foreward manner it will fail, resulting in

 $ ./myExample 
 Segmentation fault

So, why is this? Albeit you might have already spotted the reason for the segmentation fault, lets use GDB to hunt down the error! To start the GDB tool in order to debug the example program simply type

 gdb myExample

on the command line. This will enter the interactive GDB mode, only. It will not run your program right away!

To have a look at the source code that referst to the executable myExample you might use the GDB command list in one of several ways. If you vaguely remember the name of a GDB command but cannot recall how to properly use it, you can use the help function to find out about the command. E.g. to obtain details on the list command, simply type:

 
(gdb) help list
List specified function or line.
With no argument, lists ten more lines after or around previous listing.
"list -" lists the ten lines before a previous ten-line listing.
One argument specifies a line, and ten lines are listed around that line.
Two arguments with comma between specify starting and ending lines to list.
Lines can be specified in these ways:
  LINENUM, to list around that line in current file,
  FILE:LINENUM, to list around that line in that file,
  FUNCTION, to list around beginning of that function,
  FILE:FUNCTION, to distinguish among like-named static functions.
  *ADDRESS, to list around the line containing that address.
With two args if one is empty it stands for ten lines away from the other arg.
  

Hence to see the beginning of the main function one has to type

 
(gdb) list main
15        printf("]\n");
16      
17      }
18      
19      
20      int main(int argc, char *argv[]) {
21        int *myArray,N,i;
22      
23        N=10;
24        myArray = (int *) malloc(-N*sizeof(int));
  

which yields the 10 surrounding lines of the beginning of the main function. To get the next 10 lines you might just type enter.

Now, in order to execute your program type

 
(gdb) run
Starting program: /user/fk5/ifp/agcompphys/alxo9476/wmwr/debug/gdb/c/myExample 

Program received signal SIGSEGV, Segmentation fault.
0x000000000040070f in main (argc=1, argv=0x7fffffffe5a8) at myExample.c:27
27          myArray[i]=i;
  

So, there appears to be a problem at line 27 in the source code, at which point GDB reports a segmentation fault. To have a look at the context where the error occured you can type list 27 to find:

 
(gdb) list 27
22      
23        N=10;
24        myArray = (int *) malloc(-N*sizeof(int));
25      
26        for(i=0; i<N; i++){ 
27          myArray[i]=i;
28        }
29      
30        free(myArray);
31        return 0;
  

One way to proceed is to clean restart and set a so called breakpoint at line 27. This will case the program flow to interupt as soon as it reaches line 27, right before the command in that line, namely myArray[i]=i;, is executed:

 
(gdb) kill
Kill the program being debugged? (y or n) y
(gdb) break 27
Breakpoint 1 at 0x4006f8: file myExample.c, line 27.
(gdb) run
Starting program: /user/fk5/ifp/agcompphys/alxo9476/wmwr/debug/gdb/c/myExample 
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x2aaaaaaab000

Breakpoint 1, main (argc=1, argv=0x7fffffffe5a8) at myExample.c:27
27          myArray[i]=i;
  

Now, you might display the variables befor line 27 is executed. E.g. to see the current value of the iteration variable i you can type print i to find

 
(gdb) print i
$1 = 0
  

Similarly, the address in memory of the array myArray can be listed by typing

 
  (gdb) print myArray
$2 = (int *) 0x0
  

At this point note that the address of the array myArray looks awkward. Obviousely something went wrong! It should rather look similar to something like 0x7fffffffe5a8, i.e. the address of the array argv in the argument list of the main function! Now, 0x0 is the hexadecimal variant of 0. Hence, at line 27, myArray points to NULL. At this point, bear in mind that whenever you try to allocate memory for a data structure and something went wrong during the allocation procedure via malloc and no memory is reserved, the NULL pointer is given as a return value by malloc. Hence, the address 0x0 is a hint that something went wrong during the allocation procedure. To bracket the error further, lets see how the allocation procedure affects the pointer myArray. The memory allocation is done at line 24, so lets assign an additional break point via break 24 and do a clean restart by again calling kill. By the way, if you assigned several breaking points already and want to have a detailed look at them or simply overview them, you can type

 
(gdb) info b
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00000000004006f8 in main at myExample.c:27
        breakpoint already hit 1 time
2       breakpoint     keep y   0x00000000004006d8 in main at myExample.c:24
  

Now, running the program will first stop prior to the command issued in line 24, where we might check the address of the array myArray before and after the attempted memory allcation:

 
(gdb) run
Starting program: /user/fk5/ifp/agcompphys/alxo9476/wmwr/debug/gdb/c/myExample 
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x2aaaaaaab000

Breakpoint 2, main (argc=1, argv=0x7fffffffe5a8) at myExample.c:24
24        myArray = (int *) malloc(-N*sizeof(int));
(gdb) print myArray
$3 = (int *) 0x7fffffffe5a0
(gdb) next
26        for(i=0; i<N; i++){ 
(gdb) print myArray
$4 = (int *) 0x0
  

And indeed, by looking at the arguments of the function malloc it should be clear that the minus sign supplied with the size of the desired memory block caused the trouble.

Now, correcting the error by changing line 24 to

 myArray = (int *) malloc(N*sizeof(int));

yields a functioning program.

Further features of GDB

To illustrate further features of GDB, consider the corrected variant of the example program. Starting a GDB session for the corrected program offers the possibility to further explore some of its features:

Modifying the value of variables

Say you set a breakpoint at line 24, looking up the value of N yields the answer 10 of course. If you want to change the value of N for the remaining session to the value 15, you might type

 set var N=15

Calling functions within GDB

Say you set a breakpoint at line 29, i.e. right after the values of the array are initialized. If you are interested in whether all entries are properly initialized you can call the function listArray declared in the source code by typing:

 
(gdb) break 29
Breakpoint 1 at 0x40071b: file myExample.c, line 29.
(gdb) run
Starting program: /user/fk5/ifp/agcompphys/alxo9476/wmwr/debug/gdb/c/myExample 
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x2aaaaaaab000

Breakpoint 1, main (argc=1, argv=0x7fffffffe5a8) at myExample.c:30
30        free(myArray);
(gdb) call listArray(myArray,N)
array = [0 1 2 3 4 5 6 7 8 9 ]
  

The possibility to directly use functions declared in the source code is especially useful if you want to display the content of intricate custom data structures.

Listing array entries

A more easy way to list single entris of an array and whole ranges of array entries is as follows (continuing the previous GDB session):

 
(gdb) print myArray[0]
$1 = 0
(gdb) print myArray[0]@5
$2 = {0, 1, 2, 3, 4}
(gdb) print myArray[0]@15
$3 = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 135121, 0, 0, 0, 0}
  

Note that in the last line, also parts in the memory outside of the range of the array myArray are displayed, which is due to the particular way @X addresses the entries of the array. If you are familiar with the concept of pointers, you might immediately think of the correspondance between pointers and addresses in memory. By using the dereference operator, i.e. the prefix *, you can access the value stored at the respective memory address:

 
(gdb) print *(myArray)
$4 = 0
(gdb) print *(myArray+1)
$5 = 1
(gdb) print *(myArray+10)
$6 = 135121
  

Executing source code line by line

Say you set a breakpoint at line 27. Then you can continue to execute the program line by line by typing the GDB command step. As an additional argument you can specify the number of times the command should be executed:

 
(gdb) break 27
Breakpoint 1 at 0x4006f6: file myExample.c, line 27.
(gdb) run
 
Starting program: /user/fk5/ifp/agcompphys/alxo9476/wmwr/debug/gdb/c/myExample 
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x2aaaaaaab000

Breakpoint 1, main (argc=1, argv=0x7fffffffe5a8) at myExample.c:27
27          myArray[i]=i;
(gdb) print i
$1 = 0
(gdb) step 2

Breakpoint 1, main (argc=1, argv=0x7fffffffe5a8) at myExample.c:27
27          myArray[i]=i;
(gdb) print i
$2 = 1
(gdb) step 2

Breakpoint 1, main (argc=1, argv=0x7fffffffe5a8) at myExample.c:27
27          myArray[i]=i;
(gdb) print i
$3 = 2
  

The command next executes the program line by line similar to step. However, when a subroutine is called, the next command treats the whole subroutine as one instruction (this is in contrast to the step command, which would pass through the subroutine also in line by line manner).

Conditional breakpoints

You can also provide a condition which needs to be met in order for the program to stop at a specified breakpoint. Say, e.g., you might want to put under scrutiny how the array entries change within the for loop (lines 26-28) when i==5. To achieve this, you can set a breakpoint at line 27 and execute the program flow until i==5 is met:

 
(gdb) break 27
Breakpoint 1 at 0x4006f6: file myExample.c, line 27.
(gdb) condition 1 i==5
(gdb) run
Starting program: /user/fk5/ifp/agcompphys/alxo9476/wmwr/debug/gdb/c/myExample 

Breakpoint 1, main (argc=1, argv=0x7fffffffe5a8) at myExample.c:27
27          myArray[i]=i;
(gdb) print i
$1 = 5
(gdb) print myArray[0]@N
$2 = {0, 1, 2, 3, 4, 0, 0, 0, 0, 0}
(gdb) step
26        for(i=0; i<N; i++){ 
(gdb) print myArray[0]@N
$3 = {0, 1, 2, 3, 4, 5, 0, 0, 0, 0}
  

Using DDD for debugging

Note that the HPC system also supports users that prefer the "Data Display Debugger" (DDD) over GDB, since it features an intuitive graphical user interface. Consequently you need to login to your HPC account using the -X option similar to

 ssh -X abcd1234@hero.hpc.uni-oldenburg.de

to forward the X windows connection through the ssh connection. DDD is not available from scratch, you need to load the respective module first. This is done via

 module load ddd

After this you can, e.g., run DDD on the example discussed above. For this you might simply naviagte to the respective working directory and type (remember to compile the program for elaborate debugging information)

 ddd myExample

A snapshot from an actual debugging session (for the above example; breakpoint at line 23, program flow currently at line 27) using DDD is shown below.

DebuggingSession ddd.png


Debugging using the Intel debugger

The local HPC system also features several Intel Cluster Studio (ics) addons, see here. Among those you can find the Intel debugger idb along with its command line variant idbc. So as to be able to use them you need to include the intel ics module at first:

 module load intel/ics/2013.5.192/64

Then, you can compile the source code for debugging:

 icc -o myExample myExample.c -g  

(here, the program myExample.c is the same example program as the one used in the preceeding sections).

To invoke the Intel debugger for the resulting application myExample in command line mode you simply need to type

 idbc myExample

similar to the proceeding in case of the GNU debugger. If you are familiar with the GNU debugger, you might use the Intel debugger in a straight forward manner since most of the keywords are the same. E.g. to list 10 surrounding lines for line number 8 you might just type

 
(idb) list 25 
20	int main(int argc, char *argv[]) {
21	  int *myArray,N,i;
22	
23	  N=10;
24	  myArray = (int *) malloc(-N*sizeof(int));
25	
26	  for(i=0; i<N; i++){ 
27	    myArray[i]=i;
28	  }
29	
  

Then, to set a breakpoint at line 24 and to run the code until that breakpoint is hit you might type

 
(idb) break 24
Breakpoint 1 at 0x400622: file
/user/fk5/ifp/agcompphys/alxo9476/wmwr/debug/idb/c/myExample.c, line 24.
(idb) run
Starting program: /user/fk5/ifp/agcompphys/alxo9476/wmwr/debug/idb/c/myExample
[New Thread 8593 (LWP 8593)]

Breakpoint 1, main (argc=1, argv=0x7fffffffd8f8) at
/user/fk5/ifp/agcompphys/alxo9476/wmwr/debug/idb/c/myExample.c:24
24	  myArray = (int *) malloc(-N*sizeof(int));
  

Then, printing the address of the array myArray, stepping forward until the next line occurs and again printing the address of the array (which effectively locates the error in the sourcecode already) is done via

 
(idb) print myArray
$1 = (int *) 0x7fffffffd8f0
(idb) step
26	  for(i=0; i<N; i++){ 
(idb) print myArray
$2 = (int *) 0x0
  

So, not only the commands are similar, also the response of the Intel debugger is similar to that of the GNU debugger, discussed above in more detail.

Albeit it is possible to also use the Intel debugger on programs compiled via a GNU compiler (this works by specifying an additonal command line option in the form idbc -gdb myExample), it is recommended to use debuggers along with their corresponting compilers (e.g., compiler icc with debugger icdbc and compiler gcc with debugger gdb).

More information on the Intel debugger can e.g. be found in the Intel Debugger Command Reference.