Debugging MPI programs with GDB

28 October 2021

Debugging programs that run in parallel can be really difficult. It is hard to keep track of state across multiple processes and observe bugs that are timing-dependent, such as deadlocks. This post will not solve any of these problems, but at least it will show you how to easily attach a debugger to a C++ program that uses the Message Passing Interface (MPI).

This is a quick tip in two parts.

Part 1: The code snippet

#include <fstream>
#include <unistd.h>
#include <iostream>

using namespace std;

void attach_debugger(bool condition) {
    if (!condition) return;
    bool attached = false;

    // also write PID to a file
    ofstream os("/tmp/mpi_debug.pid");
    os << getpid() << endl;
    os.close();

    cout << "Waiting for debugger to be attached, PID: "
        << getpid() << endl;
    while (!attached) sleep(1);
}

This snippet will check if the condition is true (such as rank == 0), and if so will output its process ID (PID) to both the standard output and a temporary file. After that, it will enter an endless loop with seemingly no method of escape, if it weren’t for Part Two!

Part 2: The debugger

I typically use the GNU Debugger GDB, since it can be easily used from the command line. When running an MPI program that contains above snippet, it will prompt you to attach a debugger. Now, you could type gdb -p xyz where xyz is the PID the program just spit out, change the attached flag and resume program execution, but that is really uncomfortable if you have to do it twice, or twenty times in a row!

Instead, put the following snippet into ~/.gdbinit:

define dbg
    select function attach_debugger
    set attached = true
end

Then you can simply run gdb -p $(</tmp/mpi_debug.pid), type dbg and press enter, and voilà, you are ready to search for those pesky bugs.

Happy bug hunting!

#programming #debugging