Intro to PDB, the Python Debugger

Looks like crap, but tastes great

May 09, 2023

Summary

PDB is an ugly but convenient debugger that is always available with Python.

Using the breakpoint() function, you can pause any program at a specific line, and enter a debugging shell.

In this shell you can run any Python code and access the program state at this line.

You can also use PDB commands to help you explore your program:

help lists all commands or shows the help on a command
quit exits the debugger
list . shows you where you are
next executes the next line
continue runs the program until the next stop
until line runs the program until a line number
jump line skips the execution until a line number
display shows the result of an expression when it changes
step and return go in and out of function calls
up and down zoom in and out of the call stack

Why learn to use PDB?

When I started programming, I used a very crude method of debugging: print().

But now that I have almost 2 decades of experience, I can tell you that... I still use mostly print().

Although now I type print() way faster.

Once in a while, some bug escape the magical transcendence of print(), and I have to use tooling.

The most fundamental of tools for the job is the debugger, and I meet more and more coders that have never used one, so I decided to write a post for them.

There are plenty of debuggers for Python, a lot of editors come with one. However we are very lucky because the language itself provides one by default!

It's really bare-bone, but it's always there. No matter your OS, the Python version, what tools you use, PDB is always there for you.

Also, it's fast. Debugger tends to slow down the program they debug but PDB has a very minimal overhead.

This is why, despite the fact it is not the comfiest debugger in town, nor the prettiest, ...nor the anythingest, it's good to know how to use it.

In fact, if you know how to use PDB, you know how to use any other debugger, so it's time well invested.

First step in PDB

Imagine you have a small script that checks blood type compatibility:

compat = {
    "O-": ["O-"],
    "O+": ["O+", "O-"],
    "A-": ["A-", "O-"],
    "A+": ["A+", "A-", "O-", "O+"],
    "B-": ["B-", "O-"],
    "B+": ["B-", "B+", "O-", "O+"],
    "AB-": ["AB-", "B-", "O-", "A-"],
    "AB+": ["AB+", "O+", "A-", "A+" "B-", "B+", "AB-", "AB+", "O-"],
}


def survive(blood_type, donated_blood):
    return donated_blood in compat[blood_type]


def main():
    blood_type = input("Enter your blood type: ")
    donated_blood = input("Enter the blood type you received: ")

    if survive(blood_type, donated_blood):
        print("No, not I, I will survive")
    else:
        print("ded")


if __name__ == "__main__":
    main()

Sometimes, it gets a KeyError:

Enter your blood type: a+
Enter the blood type you  received: b-
Traceback (most recent call last):
  File "/home/user/Work/ecriture/bytecode.dev/newsletter/20230508_pdb/script.py", line 28, in <module>
    main()
  File "/home/user/Work/ecriture/bytecode.dev/newsletter/20230508_pdb/script.py", line 21, in main
    if survive(blood_type, donated_blood):
  File "/home/user/Work/ecriture/bytecode.dev/newsletter/20230508_pdb/script.py", line 14, in survive
    return donated_blood in compat[blood_type]
KeyError: 'a+'

We can explore the state of the program right before the error by calling breakpoint() just before the line 14:

def survive(blood_type, donated_blood):
    breakpoint()
    return donated_blood in compat[blood_type]

And start the program all over again.

This will run the program until this point, and start the Python debugger:

Enter your blood type: a+
Enter the blood type you  received: a-
> /home/user/Work/ecriture/bytecode.dev/newsletter/20230508_pdb/script.py(15)survive()
-> return donated_blood in compat[blood_type]
(Pdb)

(Pdb) is the prompt of a new type of shell, a debugging shell. It has access to the entire state of the program at this point. You can enter any valid python code, provided it fits on one line:

(Pdb) print('hello')
hello
(Pdb) from datetime import date
(Pdb) date.today()
datetime.date(2023, 5, 9)
(Pdb)

But more interestingly, you can use the current state of the program in the code:

(Pdb) print(donated_blood)
a-
(Pdb) compat[blood_type]
*** KeyError: 'a+'

And just like that, we have found that this is the particular part of the code that triggers the error. We can now experiment live:

(Pdb) "a+" in compat
False
(Pdb) compat.keys()
dict_keys(['O-', 'O+', 'A-', 'A+', 'B-', 'B+', 'AB-', 'AB+'])
(Pdb) "A+" in compat
True

So the problem was that we used a lowercase "a", and the dictionary contains uppercase "A". We can even check a solution in the debugger:

(Pdb) compat[blood_type.upper()]
['A+', 'A-', 'O-', 'O+']

At this stage, we can stop our session by using quit (without parentheses):

(Pdb) quit
Traceback (most recent call last):
  ...
  File "/usr/lib/python3.8/bdb.py", line 113, in dispatch_line
    if self.quitting: raise BdbQuit
bdb.BdbQuit

This quite literally crashes the program to exit immediately. Don’t worry, no snakes have been harmed.

PDB commands

We just used quit without parentheses, and it did something. It's uncommon in python.

It's because quit it is not regular Python code, but rather a PDB command.

The most import command is help, which alone list all other commands:

(Pdb) help

Documented commands (type help <topic>):
========================================
EOF    c          d        h         list      q        rv       undisplay
a      cl         debug    help      ll        quit     s        unt
alias  clear      disable  ignore    longlist  r        source   until
args   commands   display  interact  n         restart  step     up
b      condition  down     j         next      return   tbreak   w
break  cont       enable   jump      p         retval   u        whatis
bt     continue   exit     l         pp        run      unalias  where

Miscellaneous help topics:
==========================
exec  pdb

And if help is given a command name, it prints some information about this command. E.G, to get some help about the next command:

(Pdb) help next
n(ext)
        Continue execution until the next line in the current function
        is reached or it returns.

The `list` command

list . (notice the dot) will list the line of code where you are. E.G., if I put a break point here:

    breakpoint()
    print("No, not I, I will survive")

Then:

(Pdb) list .
 18  	    blood_type = "A+" or input("Enter your blood type: ")
 19  	    donated_blood = "A+" or input("Enter the blood type you  received: ")
 20
 21  	    if survive(blood_type, donated_blood):
 22  	        breakpoint()
 23  ->	        print("No, not I, I will survive")
 24  	    else:
 25  	        print("ded")
 26
 27
 28  	if __name__ == "__main__":

The arrow tells you the next line to be executed is line 23.

If you don't pass the dot, list will paginate 11 lines from the previous list call. It's not that useful, so use the dot.

The `next` command

next will execute the next line (the one with the arrow in list). If we are in this context:

(Pdb) list .
 18  	    blood_type = "A+" or input("Enter your blood type: ")
 19  	    donated_blood = "A+" or input("Enter the blood type you  received: ")
 20
 21  	    if survive(blood_type, donated_blood):
 22  	        breakpoint()
 23  ->	        print("No, not I, I will survive")
 24  	    else:
 25  	        print("ded")
 26
 27
 28  	if __name__ == "__main__":

Then using next will do:

(Pdb) next
No, not I, I will survive
--Return--
> /home/user/Work/ecriture/bytecode.dev/newsletter/20230508_pdb/script.py(23)main()->None
-> print("No, not I, I will survive")

You can see before the "--Return--" that "No, not I, I will survive" has been printed.

The `continue` command

continue will carry on the program execution until the next break point. If no break point is encountered, the program will execute normally until it ends.

The `until` command

until line will carry on the program execution until the line number you give to it is reached. Very useful to go through a loop. If we are here:

(Pdb) list .
 18  	    breakpoint()
 19
 20  ->	    for x in range(10):
 21  	        print(x)
 22
 23  	    print("Dobby is freeeeeeee")
 24

Then we can do:

(Pdb) until 23
0
1
2
3
4
5
6
7
8
9
> /home/user/Work/ecriture/bytecode.dev/newsletter/20230508_pdb/script.py(23)main()
-> print("Dobby is freeeeeeee")

To execute everything until we reach line 23 (which is not executed yet).

until will honor break points, so it may not reach the line you give to it if there is a break point on the way.

The `jump` command

jump is like until, but it goes directly to the line, and does not execute any code in between. Useful to skip some code you don't want to run. Also, it skips break points on the way. With the same example as before:

(Pdb) jump 23
> /home/user/Work/ecriture/bytecode.dev/newsletter/20230508_pdb/script.py(23)main()
-> print("Dobby is freeeeeeee")

You can see nothing is printed, because the loop is not executed at all and we jump directly to line 23.

The `display` command

display code will run the Python code you give to it and display the value. It will do this when you call it, then every time some part of the program is executed AND the value changes. So it may display the value if you call next, continue, until, etc.

You can use it to keep track of some calculation as you explore the program without having to print() it every time. E.G., lets print the current time:

(Pdb) display datetime.now()
display datetime.now(): datetime.datetime(2023, 5, 9, 9, 26, 9, 113475)


(Pdb) next
> /home/user/Work/ecriture/bytecode.dev/newsletter/20230508_pdb/script.py(20)main()
-> donated_blood = "A+" or input("Enter the blood type you  received: ")
display datetime.now(): datetime.datetime(2023, 5, 9, 9, 26, 10, 922923)  [old: datetime.datetime(2023, 5, 9, 9, 26, 9, 113475)]


(Pdb) next
> /home/user/Work/ecriture/bytecode.dev/newsletter/20230508_pdb/script.py(22)main()
-> if survive(blood_type, donated_blood):
display datetime.now(): datetime.datetime(2023, 5, 9, 9, 26, 12, 93853)  [old: datetime.datetime(2023, 5, 9, 9, 26, 10, 922923)]

In some debugger, this feature is called "watch expression" or "spy expression".

If you don't want to see it anymore, call undisplay.

The "step" and "return" commands

step and return are both used together to get inside and exit functions or methods.

Indeed, if you are here:

 25
 26  	    breakpoint()
 27  ->	    if survive(blood_type, donated_blood):
 28  	        print("No, not I, I will survive")

And you call next, you will execute survive(blood_type, donated_blood) and go to line 28. But you will not see what happens inside survive.

step is like next, but for this particular case. It will execute the survive function, but put you in the first line inside it:

(Pdb) step
--Call--
> /home/user/Work/ecriture/bytecode.dev/newsletter/20230508_pdb/script.py(13)survive()
-> def survive(blood_type, donated_blood):


(Pdb) list .
  8  	    "AB-": ["AB-", "B-", "O-", "A-"],
  9  	    "AB+": ["AB+", "O+", "A-", "A+" "B-", "B+", "AB-", "AB+", "O-"],
 10  	}
 11
 12
 13  ->	def survive(blood_type, donated_blood):
 14  	    return donated_blood in compat[blood_type]
 15
 16
 17  	def main():
 18  	    for x in range(10):

This way, you can call next and see how this function is working step by step.

return does the opposite of step. You use it from inside a function to get to the end of its execution immediately:

(Pdb) return
--Return--
> /home/user/Work/ecriture/bytecode.dev/newsletter/20230508_pdb/script.py(14)survive()->True
-> return donated_blood in compat[blood_type]

Now, you can next your way out of the function and go back where you were before step.

The `up` and `down` commands

up and down are my favorite commands. They don't execute anything, but they zoom in and out of the code by letting you go up and down the call stack.

If this means nothing to you, it's a way to answer the old question, "who is the idiot passing this function the wrong parameters?".

Let's say I'm here:

(Pdb) list .
 10  	}
 11
 12
 13  	def survive(blood_type, donated_blood):
 14  	    breakpoint()
 15  ->	    return donated_blood in compat[blood_type]
 16
 17
 18  	def main():
 19  	    for x in range(10):
 20  	        print(x)

If I want to know what part of the code is passing me blood_type, I can use up:

(Pdb) up
> /home/user/Work/ecriture/bytecode.dev/newsletter/20230508_pdb/script.py(27)main()
-> if survive(blood_type, donated_blood):


(Pdb) list .
 22  	    print("Dobby is freeeeeeee")
 23
 24  	    blood_type = "A+" or input("Enter your blood type: ")
 25  	    donated_blood = "A+" or input("Enter the blood type you  received: ")
 26
 27  ->	    if survive(blood_type, donated_blood):
 28  	        print("No, not I, I will survive")
 29  	    else:
 30  	        print("ded")
 31
 32

I've now zoomed out to see the bigger picture, and I can see that line 27 is where this parameter comes from.

Of course I can down to go back to my previous zoom level, or use up again to zoom out even more (although in this program, I'm already at the top).

It's easy to confuse up/down with step/return, but they don't do the same thing. up/down don't execute anything, they just change your point of view. Plus you can use step on any function on its way, but down can only be called if called up before.

Despite that, after a step, I like to use up + next to get out of a function instead of return + next. I find using up usually gives me the same result and is easier. Also up lets you peek at what happens upstairs and go back, while return is quite definitive.

It's a matter of taste, really.

Tips and tricks

Commands all have a one letter shortcut. next can be abbreviated n. help can be shortened to h, etc.
If you need to run a command several times in a row, you can enter it once, then press enter several times. In PDB, an empty prompt reruns the previous command. Very useful if you want to next many times.
Commands can conflict with regular Python code. E.G: the list() Python function and the list PDB command, or the “a” PDB command and a “a” variable. In this case, PDB commands have the priority if they are at the start of the line, and Python code otherwise. You can force PDB to understand something as Python code by starting with “!”. E.G: !list().
You can have as many breakpoint() as you want, don't limit yourself to one. In fact, there is even a command to add breakpoints to any line while running PDB: break.
You can put breakpoint() in a if, if you want to trigger it only under a condition. It's so useful there is even a command for that: condition. But I confess I usually write the if manually because my editor makes it easy.
Prior Python 3.7, breakpoint() didn't exist, and people had to type import pdb; pdb.set_trace().
There are better shell debuggers. E.G: ipdb is like PDB + iPython. You can set which debugger breakpoint() starts with the environment variable PYTHONBREAKPOINT. If this means nothing to you, there is another article for that.
We didn't cover all commands in this article. Once you get comfortable with PDB, explore the other ones.
If you start a Python script using python -m pdb path/to/script.py, you will start PDB in post-mortem mode. This will start a PDB shell immediately, but if you continue, it will run the entire program normally. However, if the program crashes, it will open a debugging shell right where the exception occurred. Very useful to debug a crash. The command supports -m itself, so you can do python -m pdb -m module_to_debug
Annoyingly, post-mortem debugging drops you into a debugger at the very starts of the program, which forces you to continue to get to the exception. You can pass -c c to avoid this, as it runs the continue command automatically for you.
The PDB shell allows only one line of code, and has weird scoping. If you feel that it’s limiting you, type interact and you will be dropped into a regular shell. Exit the shell to resume debugging.
If you pass --pdb to pytest, it will start a debugging shell at every failing test, at the line where it failed.
The following alias is very useful in linux:

debug_module() {
  if python -c "import ipdb" &>/dev/null; then
    python -m ipdb -c c -m "$@"
  else
    python -m pdb -c c -m  "$@"
  fi
}

With this, calling debug_module your_module will automatically use all the trick in the book: use ipdb if it exists, use the double “-m”, call “-c c”, etc.

NMS

Jul 6, 2023

Great article. pdb is one of those tools which has a pretty large ROI payoff and everyone should learn to use. One point I feel should be mentioned (and the main reason I switched to using the PyCharm debugger) is that pdb doesn't support debugging multithreaded programs very well (See https://github.com/python/cpython/issues/85743, https://github.com/python/cpython/issues/67352 and https://github.com/python/cpython/issues/65480)

Expand full comment

1 reply by Bite Code!

Tom

May 11, 2023

Loving this post. I use pdb because I’m debugging code that executes within a docker container and docker compose. I use breakpoint(), step, continue, next, and list quite a bit. Jump, until, and display are new to me though. Does display stop and show a value whenever a value changes, even if you’re next-ing over a function call?

2 replies by Bite Code! and others

8 more comments...

Bite code!

Discussion about this post