Summary
In order to explain the use case of mixing a context manager and an iterator, I will spend the next few articles introducing each concept.
We will start with how to use context managers, a handy Python feature that encourages best practices to acquire and release resources. You probably already used it when opening a file:
with open('filename.txt', 'w') as dst:
dst.write('content to put in the file')
Context managers are good for everything that needs a setup and a tear down, and the standard library comes with contextlib
, a module packed with ready to be used managers:
contextlib.chdir: to change the current working directory and restore it.
contextlib.redirect_stdout and contextlib.redirect_stderr: to capture print().
contextlib.suppress: to ignore errors.
context.closing: to close anything that can be closed.
Of course, you don't have to use with
all the time, with everything. It's fine to do it manually sometimes, or not at all. Python allows for quick and dirty scripting as well, and that's part of the appeal.
Feeling inspired
Because of “This will be easy”, I updated my knowledge on libs for retrying stuff. I discovered that Hynek Schlawack, author of the popular attrs and structlog, also wrote stamina, an opinionated wrapper for tenacity, a fork of retrying , which used to be my go-to tool to solve the problem.
At this point you are wondering what the heck this has to do with context managers. It turns out stamina is using a pattern I have never seen before: mixing a context manager with an iterator. I think it's quite a brilliant interface, and I decided to use it as the common thread to run through the next articles.
First, one article on how to use context managers, then one on how to build your own. Then one on how to use iterators (remember the start of the week are for beginner-friendly articles, while the end of the week assumes more experience), then one on how to create your own.
Finally, one article on using the design pattern from stamina (well, from tenacity, really, as it originated here), and one on how to make them both work together. The first series of the blog. Substack is not ideal to make a series, as linking is limited, but it will have to do.
The with
keyword
Most Python developers have used at least one context manager: open()
.
Tutorials everywhere will tell you to do this:
with open('filename.txt', 'w') as dst:
dst.write('content to put in the file')
And it's the best practice Pareto of the time, which you know I strongly believe in.
Why? You can open a file this way:
dst = open('filename.txt', 'w')
dst.write('content to put in the file')
And for a small script, it will work. But if your script becomes more complicated, you will want to choose when the file is closed:
dst = open('filename.txt', 'w')
dst.write('content to put in the file')
dst.close()
Otherwise, you may have some data never committed to the file, or some concurrent access may bite you.
In fact, there is even a limit to how many files your process is allowed to open, so you may consume too many OS file descriptors. On Linux:
>>> import shutil
>>> import os
>>> os.makedirs('/tmp/test')
>>> files = []
>>> for x in range(1025):
... f = open(f'/tmp/test/{x}', 'w')
... files.append(f)
...
Traceback (most recent call last):
Cell In[8], line 2
f = open(f'/tmp/test/{x}', 'w')
OSError: [Errno 24] Too many open files: '/tmp/test/1012'
But closing files manually is verbose, because you also have to take in consideration that the file may not be readable (or writable):
dst = open('filename.txt', 'w')
try:
dst.write('content to put in the file')
finally: # always do this even if .write() fails
dst.close()
So this is what context managers give you: when you apply the with
keyword to them, they will do something in try
, then when you get out of the with
block, they will do another thing in a finally.
You have strong guarantees that this last action, which is often a clean-up, will be performed. It's also a visual thing, as it makes the scope of the action very clear. In our example, it's obvious that you only use the file between the use of the with
keyword, and the end of the with
block. It encourages best practices and readable code.
It's a nice feature, and even typescript is pushing for it, albeit a nerfed version inspired from C#.
contextlib
Context managers are not just for files. It's a generic concept that applies to anything you need to setup, and tear down after. Any resource you want to handle, or process you want to reverse, is a good candidate.
The Python standard library comes with a bunch of useful ones.
contextlib.chdir()
Inspired by fabric, and brand new in 3.11, it changes the current working directory, and restore it when you go out of it:
import os, contextlib
print(os.getcwd())
with contextlib.chdir('/home/'):
print(os.getcwd())
with contextlib.chdir('/') :
print(os.getcwd())
print(os.getcwd())
print(os.getcwd())
This outputs:
/drive
/home
/
/home
/drive
contextlib.redirect_stdout() and .redirect_stderr()
Sometimes, you call code that makes the indelicate choice of printing things instead of returning them. In that case, you can temporarily redirect the standard outputs into a buffer, and collect the precious data:
import contextlib, io
def selfish_add(a, b):
print(a + b)
# put the result into a io.StringIO() buffer
with contextlib.redirect_stdout(io.StringIO()) as output:
selfish_add(1, 2)
result = int(output.getvalue()) # that's text
print(result + 1)
Which will hapilly display '4'.
As you imagine, redirect_stderr
capture stderr, instead of stdout.
contextlib.suppress()
Unlike PHP, Python doesn't have a shortcut to silence errors, which is a good thing. Having dealt with code riddled with @
20 years ago, I can attest it's not a fun experience to debug.
Now sometimes, you do want to silence errors. E.G: you are deleting a temporary file, and you know it might already have been cleaned up. This can be done with a try
/except
, but we can make it even more explicit with:
import contextlib, os
with contextlib.suppress(FileNotFoundError):
os.remove('/tmp/new_google_service')
context.closing()
Context managers are excellent at closing things. Not just files, anything really. Connections to databases, HTTP sessions, your heart after you accepted to just be good friends...
But what if the object you need to close isn't a context manager? Well, contextlib.closing()
will take care of this, by diligently calling .close()
on anything you pass to it:
>>> import contextlib
... class Door:
... def close(self):
... print('thud')
...
... with contextlib.closing(Door()):
... print("Deep voice: I'm in")
...
Deep voice: I'm in
thud
Unfortunately, it can only close things with a .close()
method. No parameter to tell it to call .disconnect()
, or .itsover()
.
But you can customize it though:
class slamming(contextlib.closing):
def __exit__(self, *exc_info):
self.door.slam()
How this works will be made clearer in the next article, but how to use it stays the same:
>>> import contextlib
... class Door:
... def slam(self):
... print('bam!')
...
... with slamming(Door()):
... print("Deep voice: I'm in")
...
Deep voice: I'm in
bam!
There are a lot more in there
Contextlib is richer than that, but the rest is mostly for advanced usage, and some of it requires to know what I'll write about this WE, so we will stop here.
You should know that context managers are used everywhere, not just inside the stdlib. You will find them in django transactions, structlog context variables and pandas options context.
It's a good tool to be comfortable with. In fact, you should be actively looking for them in the doc, every time you handle something that needs to clean up.
They come with a few surprises sometimes, like with you use generators depending on a file handler. But it's a good thing: it will force you to think about the real scope of your data.
You don't have to use "with open()" all the times
Yes, using with
for opening files is a good practice. But you don't need to follow the good practices all the times.
Firstly, when reading files, dst = open()
is fine for small scripts and in the shell. There is little risk of critical problems, it's easier and faster to type, and you can pass the file descriptor into anything that accepts an iterable. It's very handy. Don't worry, the file will be closed by Python automatically when the reference is released, or when the program stops.
E.G:
>>> lines = (line for line in open('/etc/fstab') if line.strip() and not line.startswith('#'))
>>> print(*lines, sep="")
UUID=10654268-0809-4f8f-8f8e-e1a2811e1688 / ext4 errors=remount-ro 0 1
UUID=7C68-250E /boot/efi vfat umask=0077 0 1
UUID=23fa1a22-ab1f-418d-b00f-363ca8f92261 none swap sw 0 0
UUID=69306B771D56D4AA /media/shared ntfs rw,nosuid,nodev,noatime,allow_other 0 0
Secondly, unless you are dealing with a 16GB a Linux ISO file, loading the entire data in memory is OK. And for this, pathlib is your best friend:
>>> from pathlib import Path
>>> Path('small_file').write_text("life can be simple")
18
>>> Path('small_file').read_text()
'life can be simple'
That's why I'm rooting for a Path literal.
Jump to the next article to see how to create our own context managers, and why to do so.