A bit of Python black magic that lets you efficiently inspect and manipulate execution contexts after crashes, aka, post-mortem debugging.
Author
Nima Sarang
Published
January 30, 2025
1 Introduction
I bet this experience sounds familiar:
Your code has been running smoothly for a while, and you’re finally breathing easy. Then, an hour into execution an error occurs. The worst part? You have no idea what caused it based on the logs alone, and source is buried 8 calls deep in the stack trace. You’re dreading the thought of validating the data from start to finish to find the root cause, but there’s no other way.
Well, I might’ve found the solution to this. Python’s built-in traceback lets you inspect variables at each step of the stack trace, and even better, save the current context for reuse later! If you’re familiar with the pdb debugger, this article is similar in spirit but with major differences.
This is the most useful code I’ve written in ages, and it took some mighty effort to get it right due to the lack of proper documentation. If you want the TL;DR, just skip to Section 4. Otherwise, let’s dive in.
2 Post-mortem
The term “post-mortem” in the context of debugging is when you’re trying to figure out what went wrong after an error has occurred, meaning it doesn’t require anything to set up in advance, like breakpoints or logging.
When a Python program runs, it maintains a call stack that tracks the sequence of function calls. Each entry on this stack is called a frame and represents a function call in progress. A frame contains information about the function in progress, such as the function’s code object, its local variables, the global variables in its namespace, references to the previous frame, and so on. When an exception occurs in Python, the interpreter captures the entire call stack at the moment of the error and preserves all the frames. This stack trace shows the execution path that led to the error.
To leverage this, the most common approach is using the Python debugger, pdb, which lets you inspect each frame in the traceback, and even execute code within the context of the frame. The IPython version of pdb, ipdb, is available in the IPython/Jupyter based environments and makes navigating the stack even easier.
I won’t be explaining pdb here, since what I’m about to propose is an alternative to it. Still, if you’re interested in learning more about it, here are some resources to get you started:
For years I’ve wanted a way to easily inspect the context that led to an error. This might be manageable in a notebook where all your code lives together, but for any decent-sized project where code sprawls across multiple files? Not so much.
I understand that pdb offers a solution to some extent, but my main gripe with it is how clunky it feels in terms of navigation and code execution. Why do we need another interactive interface within an interactive notebook?
What sparked my interest was this article by Andy Jones, where he created a helper function that grabs a frame’s context and copies it to IPython’s namespace. This is a great idea, but I wanted to take it further and make it even more user-friendly.
What if we could pick and choose the frames we want to inspect based on the error message, make a copy of the context, and skip pdb altogether as a bonus?
Toy Example
Let’s say we have a project structured like this:
notebook.ipynb
src/
model.py
run.py
The code in run.py looks like:
import numpy as npfrom .model import ModelClassdeftrain_model(length: int): model = ModelClass() data = np.arange(length) model.train(data)
And in model.py we have:
classModelClass:deftrain(self, data):try: data_norm = data /sum(data)self._validate_data(data_norm)except:raiseRuntimeError("Second exception: This is a dummy exception.")def_validate_data(self, data):iflen(data) <10:raiseValueError("First exception: Data is too short.")
Consider a simple scenario where we pass in the wrong data type:
from src.run import train_modeltrain_model(length="10")
---------------------------------------------------------------------------TypeError Traceback (most recent call last)
Cell In[1], line 3 1fromsrc.runimport train_model
----> 3train_model(length="10")
File ~/Nimas/nsarang.github.io/blog/2025-01-30-post-mortem/src/run.py:8, in train_model(length) 6deftrain_model(length: int):
7 model = ModelClass()
----> 8 data =np.arange(length) 9 model.train(data)
TypeError: arange() not supported for inputs with DType <class 'numpy.dtypes.StrDType'>.
Oops, wrong input! Before doing anything else, let’s grab a reference to the exception:
import sysexception_1 = sys.last_valueexception_1
TypeError("arange() not supported for inputs with DType <class 'numpy.dtypes.StrDType'>.")
sys.last_value contains the most recent exception. There’s also sys.last_type and sys.last_traceback, but we can get those from sys.last_value directly.
import tracebackexec_type =type(exception_1)exc_tb = exception_1.__traceback__# Helper function to print the tracebacktraceback.print_tb(exc_tb)
File "/Users/nsarang/micromamba/envs/arclight/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/var/folders/29/fh16rbz95b99yz5df3c6yt2h0000gn/T/ipykernel_8225/205929029.py", line 3, in <module>
train_model(length="10")
File "/Users/nsarang/Nimas/nsarang.github.io/blog/2025-01-30-post-mortem/src/run.py", line 8, in train_model
data = np.arange(length)
^^^^^^^^^^^^^^^^^
The traceback reveals the chain of events:
The first frame is from the IPython shell running our notebook.
The second frame is our cell call, with that funky filename like T/ipykernel_28470/3835444418.py. That’s because IPython temporarily writes your code to a file and runs it.
The third frame is where the error actually happened.
The traceback exc_tb is an entry point to the exception traceback, and all the frames are linked together as a linked list. Each frame’s tb_next points to the next frame, and tb_frame gives you the actual frame object.
traceback.print_tb(exc_tb.tb_next)
File "/var/folders/29/fh16rbz95b99yz5df3c6yt2h0000gn/T/ipykernel_8225/205929029.py", line 3, in <module>
train_model(length="10")
File "/Users/nsarang/Nimas/nsarang.github.io/blog/2025-01-30-post-mortem/src/run.py", line 8, in train_model
data = np.arange(length)
^^^^^^^^^^^^^^^^^
traceback.print_tb(exc_tb.tb_next.tb_next)
File "/Users/nsarang/Nimas/nsarang.github.io/blog/2025-01-30-post-mortem/src/run.py", line 8, in train_model
data = np.arange(length)
^^^^^^^^^^^^^^^^^
Control how much context code gets printed around the error
Assign an index to each frame for easy selection later on
Control code indentation for readability
Here’s the implementation:
import sysimport inspectimport textwrapdefprocess_exception(exception, context_lines=1, max_indent=float('inf'), frame_index=0):""" Process an exception and return a formatted traceback message and frame information. Parameters ---------- exception : Exception The exception to process. context_lines : int, optional The number of context lines to include around the current line, by default 1. max_indent : int, optional The maximum indentation to use for the code block in the error message. Defaults to no limit. frame_index : int, optional The index of the first frame, by default 0. """ tb = exception.__traceback__ frame_info = []while tb isnotNone:# Get high-level frame information filename, lineno, function_name, lines, index = inspect.getframeinfo( tb, context=context_lines )# Dedent the lines if the entire block is indented more than max_indent lines_dedented = textwrap.dedent("".join(lines)).splitlines()if lines_dedented and lines[0] andlen(lines[0]) -len(lines_dedented[0]) > max_indent: lines = textwrap.indent("\n".join(lines_dedented), " "* max_indent ).splitlines()# Construct the frame message start_no = lineno - index end_no = lineno +len(lines) - index number_width =len(str(end_no)) frame_message = [f"┌─── Frame {frame_index} "+"─"*40,f'Function {function_name}, in file "{filename}"', ]for i, file_lineno inenumerate(range(start_no, end_no)): line = lines[i].rstrip() if i <len(lines) else"" prefix ="➤➤➤ "if i == index else" " frame_message.append(f" {prefix}{file_lineno:{number_width}}: {line}") frame_message.append("") frame_info.append({"message": "\n".join(frame_message),"frame": tb.tb_frame,"locals": tb.tb_frame.f_locals.copy(), # shallow copy just in case"globals": tb.tb_frame.f_globals.copy(),"metadata": {"filename": filename,"lineno": lineno,"function_name": function_name,"lines": lines if lines else [],"index": index, }, }) tb = tb.tb_next frame_index +=1 exception_header =f"{type(exception).__name__}: {str(exception)}" traceback_message =" \n".join([frame["message"] for frame in frame_info] + [exception_header])return traceback_message, frame_info
┌─── Frame 0 ────────────────────────────────────────
Function run_code, in file "/Users/nsarang/micromamba/envs/arclight/lib/python3.11/site-packages/IPython/core/interactiveshell.py"
3575: await eval(code_obj, self.user_global_ns, self.user_ns)
3576: else:
➤➤➤ 3577: exec(code_obj, self.user_global_ns, self.user_ns)
3578: finally:
3579: # Reset our crash handler in place
┌─── Frame 1 ────────────────────────────────────────
Function <module>, in file "/var/folders/29/fh16rbz95b99yz5df3c6yt2h0000gn/T/ipykernel_8225/205929029.py"
1: from src.run import train_model
2:
➤➤➤ 3: train_model(length="10")
┌─── Frame 2 ────────────────────────────────────────
Function train_model, in file "/Users/nsarang/Nimas/nsarang.github.io/blog/2025-01-30-post-mortem/src/run.py"
5:
6: def train_model(length: int):
7: model = ModelClass()
➤➤➤ 8: data = np.arange(length)
9: model.train(data)
TypeError: arange() not supported for inputs with DType <class 'numpy.dtypes.StrDType'>.
frame_info gives us a list of dictionaries, each packed with a frame’s context:
frame_info[2]["locals"]
{'length': '10', 'model': <src.model.ModelClass at 0x107a8af50>}
Processing Chained Exceptions
Chained exceptions happen when an exception is raised while another is being handled. There are two types explained in PEP 3134.
To handle these, I’ll borrow a function from the pdb module that walks the linked exceptions and returns them in chronological order:
defget_chained_exceptions(exc):""" Given a an exception, return a tuple of chained exceptions. Borrowed and modified from the `pdb` module. """ _exceptions = [] current = exc reason =Nonewhile current isnotNone:if (current, reason) in _exceptions:break _exceptions.append((current, reason))if current.__cause__ isnotNone: current = current.__cause__ reason ="__cause__"elif ( current.__context__ isnotNoneandnot current.__suppress_context__ ): current = current.__context__ reason ="__context__"returnreversed(_exceptions)
Now let’s wrap it all together:
defextract_from_exception(exception=None, context_lines=5, max_indent=8):""" Print traceback with surrounding code context, supporting nested exceptions. exception : Exception, optional The exception to process. If not provided, the last exception will be used. For other parameters, see `process_exception`. """if exception isNone: exception = sys.last_value frames_info = [] traceback_message = []for exc_value, exc_reason in get_chained_exceptions(exception): traceback_message_single, frame_info_single = process_exception( exc_value, context_lines, max_indent, frame_index=len(frames_info) ) traceback_message.append(traceback_message_single) frames_info.extend(frame_info_single)if exc_reason =="__context__": traceback_message.append("\n\nDuring handling of the above exception, ""another exception occurred:\n\n" )elif exc_reason =="__cause__": traceback_message.append("\n\nThe above exception was the direct cause ""of the following exception:\n\n" ) traceback_message ="\n".join(traceback_message)return traceback_message, frames_info
4 Working Examples
4.1 Frame Inspection
Let’s try a more complex example with multiple exceptions. Using our earlier example from Section 3, let’s run train_model with a length of 5:
from src.run import train_modeltrain_model(length=5)
---------------------------------------------------------------------------ValueError Traceback (most recent call last)
File ~/Nimas/nsarang.github.io/blog/2025-01-30-post-mortem/src/model.py:5, in ModelClass.train(self, data) 4 data_norm = data /sum(data)
----> 5self._validate_data(data_norm) 6except:
File ~/Nimas/nsarang.github.io/blog/2025-01-30-post-mortem/src/model.py:11, in ModelClass._validate_data(self, data) 10iflen(data) <10:
---> 11raiseValueError("First exception: Data is too short.")
ValueError: First exception: Data is too short.
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
Cell In[15], line 3 1fromsrc.runimport train_model
----> 3train_model(length=5)
File ~/Nimas/nsarang.github.io/blog/2025-01-30-post-mortem/src/run.py:9, in train_model(length) 7 model = ModelClass()
8 data = np.arange(length)
----> 9model.train(data)
File ~/Nimas/nsarang.github.io/blog/2025-01-30-post-mortem/src/model.py:7, in ModelClass.train(self, data) 5self._validate_data(data_norm)
6except:
----> 7raiseRuntimeError("Second exception: This is a dummy exception.")
RuntimeError: Second exception: This is a dummy exception.
The above is Jupyter’s error message. Let’s see how ours compares:
┌─── Frame 0 ────────────────────────────────────────
Function train, in file "/Users/nsarang/Nimas/nsarang.github.io/blog/2025-01-30-post-mortem/src/model.py"
3: try:
4: data_norm = data / sum(data)
➤➤➤ 5: self._validate_data(data_norm)
6: except:
7: raise RuntimeError("Second exception: This is a dummy exception.")
┌─── Frame 1 ────────────────────────────────────────
Function _validate_data, in file "/Users/nsarang/Nimas/nsarang.github.io/blog/2025-01-30-post-mortem/src/model.py"
7: raise RuntimeError("Second exception: This is a dummy exception.")
8:
9: def _validate_data(self, data):
10: if len(data) < 10:
➤➤➤ 11: raise ValueError("First exception: Data is too short.")
ValueError: First exception: Data is too short.
During handling of the above exception, another exception occurred:
┌─── Frame 2 ────────────────────────────────────────
Function run_code, in file "/Users/nsarang/micromamba/envs/arclight/lib/python3.11/site-packages/IPython/core/interactiveshell.py"
3575: await eval(code_obj, self.user_global_ns, self.user_ns)
3576: else:
➤➤➤ 3577: exec(code_obj, self.user_global_ns, self.user_ns)
3578: finally:
3579: # Reset our crash handler in place
┌─── Frame 3 ────────────────────────────────────────
Function <module>, in file "/var/folders/29/fh16rbz95b99yz5df3c6yt2h0000gn/T/ipykernel_8225/4277452375.py"
1: from src.run import train_model
2:
➤➤➤ 3: train_model(length=5)
┌─── Frame 4 ────────────────────────────────────────
Function train_model, in file "/Users/nsarang/Nimas/nsarang.github.io/blog/2025-01-30-post-mortem/src/run.py"
5:
6: def train_model(length: int):
7: model = ModelClass()
8: data = np.arange(length)
➤➤➤ 9: model.train(data)
┌─── Frame 5 ────────────────────────────────────────
Function train, in file "/Users/nsarang/Nimas/nsarang.github.io/blog/2025-01-30-post-mortem/src/model.py"
5: self._validate_data(data_norm)
6: except:
➤➤➤ 7: raise RuntimeError("Second exception: This is a dummy exception.")
8:
9: def _validate_data(self, data):
RuntimeError: Second exception: This is a dummy exception.
Look at that! The full chain of exceptions is displayed with the relevant code context. We can now inspect the variables at each step and even rerun the code to see what happens.
print(frame_info[1]["message"])# Check if the length of 'data' is actually less than 10frame_info[1]["locals"]
┌─── Frame 1 ────────────────────────────────────────
Function _validate_data, in file "/Users/nsarang/Nimas/nsarang.github.io/blog/2025-01-30-post-mortem/src/model.py"
7: raise RuntimeError("Second exception: This is a dummy exception.")
8:
9: def _validate_data(self, data):
10: if len(data) < 10:
➤➤➤ 11: raise ValueError("First exception: Data is too short.")
What if we want to execute some code within a frame’s context, just like in pdb?
defexecute(source: str, context: dict):""" Execute the given source code in the given context. """ source = textwrap.dedent(source)# compile for better performance code =compile(source, "<string>", "exec")exec(code, context["globals"], context["locals"])
execute(r""" data = data + 10 print("data:", data) print("locals:", locals().keys()) """, context=frame_info[4])
See that? The data variable got updated in place, and the new value shows up in the locals dictionary.
We could also bring the frame’s locals into IPython’s namespace like in Andy’s approach, but there’s a risk of overwriting existing variables. Best to be selective about what we bring in.