Summary
Now that you are comfy with iteration, let’s see how to:
Get the first element of any iterable:
next(iter(iterable), default)
.Flatten one nesting level of iterable:
itertools.chain.from_iterable(nested_iterable)
.Streamline nested loops:
itertools.product(iterable1, iterable2)
.Categorize elements:
itertools.groupby(sorted(holiday_tour, key=categ), categ)
Iterate on a sliding window or chunks of iterable with itertools recipes.
Use comprehensions in
all()
andany()
.E.G:
any(line.startswith('#') for line in open('/etc/fstab'))
Getting the first element of any iterable
Getting the first element of an indexable like a tuple or a list is easy:
>>> names = ("Naboo", "Namek", "Arrakis")
>>> names = [n.upper() for n in names]
>>> print(names[0])
NABOO
However this doesn't work with all iterables, such as generators:
>>> names = ("Naboo", "Namek", "Arrakis")
>>> names = (n.upper() for n in names)
>>> print(names[0])
TypeError: 'generator' object is not subscriptable
It will also fail on an empty list:
>>> names = ("Naboo", "Namek", "Arrakis")
>>> names = [n.upper() for n in names if "a" not in n]
>>> print(names[0])
IndexError: list index out of range
The solution is to get the iterator, and request the first element with next()
. It works with any iterable:
>>> names = ("Naboo", "Namek", "Arrakis")
>>> names = (n.upper() for n in names) # generators have no index
>>> print(next(iter(names)))
NABOO
Even non-ordered ones, although you can't really say it's the first element since there is no order:
>>> names = ("Naboo", "Namek", "Arrakis")
>>> names = {n.upper() for n in names} # sets have no order
>>> print(next(iter(names)))
NAMEK
And the second parameter let you define a default value, which will trigger if the iterator is empty. So it will work with an empty list:
>>> names = ("Naboo", "Namek", "Arrakis")
>>> names = [n.upper() for n in names if "a" not in n]
>>> print(next(iter(names), "HYPERION"))
HYPERION
Note that if you are sure the iterable contains one, and only one element, unpacking can do the job:
>>> names = ("Naboo", "Namek", "Arrakis")
>>> name, = [n.upper() for n in names if n.startswith("A")]
>>> print(name)
ARRAKIS
But tuples of one element are very hard to spot. I'm sure you had to double-check the coma over there.
So instead, use the bracket-based syntax:
>>> names = ("Naboo", "Namek", "Arrakis")
>>> [name] = [n.upper() for n in names if n.startswith("A")]
>>> print(name)
ARRAKIS
Much easier to read.
itertools, my very best friend
In the last iterable related article, I encouraged you to look at the itertools library, which contain a lot of utilities to manipulate iterables, even lazy ones, without loading them all in memory.
There is a lot in this lib, and I'm not going to review everything, but here are a few of my favorite features.
flatten with chain.from_iterable()
chain(iterable1, iterable2, ...)
links several iterables together as if they were one. It's like [*iterable1, *iterable2, ...]
, but lazy. However, I use it mostly for its method from_iterable()
, which will create a chain automatically from an iterable ... of iterables.
This allows flattening one level of nesting easily:
>>> travels = {"Chulak": "Earth", "Abydos": "Alpha Site", "Tollana": "Parva" }
>>> travels.items()
dict_items([('Chulak', 'Earth'), ('Abydos', 'Alpha Site'), ('Tollana', 'Parva')])
>>> from itertools import chain
>>> destinations = list(chain.from_iterable(travels.items()))
>>> destinations
['Chulak', 'Earth', 'Abydos', 'Alpha Site', 'Tollana', 'Parva']
Reduce nested loops with product()
Nesting loops is never great: it makes the code harder to read, harder to compose, and pushes you to reach the line size limit quickly.
itertools.product()
can solve that problem if what you need is the Cartesian product of two iterables, meaning if you need to mix all the things of one iterable with all the things of another one.
E.G, this nested loop:
>>> worlds = ["Trantor", "Terminus", "Earth"]
... states = ["Colonization", "Apogee", "Decadence"]
... for world in worlds:
... for state in states:
... print(world, ":", state)
...
Trantor : Colonization
Trantor : Apogee
Trantor : Decadence
Terminus : Colonization
Terminus : Apogee
Terminus : Decadence
Earth : Colonization
Earth : Apogee
Earth : Decadence
Can be reduced to:
>>> from itertools import product
>>> worlds = ["Trantor", "Terminus", "Earth"]
... states = ["Colonization", "Apogee", "Decadence"]
... for world, state in product(worlds, states):
... print(world, ":", state)
...
Trantor : Colonization
Trantor : Apogee
Trantor : Decadence
Terminus : Colonization
Terminus : Apogee
Terminus : Decadence
Earth : Colonization
Earth : Apogee
Earth : Decadence
There is still a nested loop happening. This is not making things faster. Just cleaner.
Groupby, the function I always forget
itertools.groupby()
is like divmod()
or vars()
, I always forget they exist until I hit a problem that exactly needs it. It's not that often, but when it happens, it's so convenient.
This function will take an iterable and group it according to a function that returns the category each element belongs to. This function is named the “key”.
E.G, you want to group all words by their number of letters:
>>> holiday_tour = [
... "Centauri Prime",
... "Ganymede",
... "Minbar",
... "Pak'ma",
... "Ragesh III",
... "Seti Gamma II",
... "Z'ha'dum",
... "Balos",
... "Beta Colony",
... "Coriana VI",
... "Disney Planet",
... "Epsilon III"
... ]
... groups = {}
... for place in holiday_tour:
... groups.setdefault(len(place), []).append(place)
...
... for n, places in sorted(groups.items()):
... print(f"{n} letters:")
... for place in places:
... print(' -', place)
...
5 letters:
- Balos
6 letters:
- Minbar
- Pak'ma
8 letters:
- Ganymede
- Z'ha'dum
10 letters:
- Ragesh III
- Coriana VI
11 letters:
- Beta Colony
- Epsilon III
13 letters:
- Seti Gamma II
- Disney Planet
14 letters:
- Centauri Prime
Groupby can do that:
from itertools import groupby
for n, places in groupby(sorted(holiday_tour, key=len), len):
print(f"{n} letters:")
for place in places:
print(' -', place)
5 letters:
- Balos
6 letters:
- Minbar
- Pak'ma
8 letters:
- Ganymede
- Z'ha'dum
10 letters:
- Ragesh III
- Coriana VI
11 letters:
- Beta Colony
- Epsilon III
13 letters:
- Seti Gamma II
- Disney Planet
14 letters:
- Centauri Prime
Just make sure you always sort your iterable first, and use the same key for the sorting as the grouping.
And if sorting is something that is not familiar to you, I'll probably write an article about it soon.
Chunked and sliding windows
No matter how good itertools
is, it doesn't contain everything. In fact, the documentation lists great recipes that are explicitly not included in the module. It has always been a source of frustration: since those snippets are great, and everybody needs them one day, why not include them in the stdlib?
For this reason, more-itertools is born.
But sometimes, you just want a small function or two, and not install a 3rd party lib just for that.
Those are my two favorite, that I often copy/paste when I don't feel like firing pip.
Sliding windows
Iterate on something a few elements at a time, with all the elements overlapping but the next one:
from itertools import islice
from collections import deque
def sliding_window(iterable, size):
iterator = iter(iterable)
window = deque(islice(iterator, size), maxlen=size)
if len(window) == size:
yield tuple(window)
for element in iterator:
window.append(element)
yield tuple(window)
E.G, we got a list of of dates, and we want to know the number of days that separates each of them:
>>> import datetime as dt
>>> events = [
... dt.date(2023, 2, 11),
... dt.date(2023, 4, 5),
... dt.date(2023, 6, 5),
... dt.date(2023, 7, 2),
... dt.date(2023, 7, 26)
... ]
...
... for start_date, end_date in sliding_window(events, 2):
... days = (end_date - start_date).days
... print(f' - {days} days')
...
- 53 days
- 61 days
- 27 days
- 24 days
Starting from Python 3.10, itertools.pairwise does this, but only for windows of 2 elements.
Chunked
Named "grouper" in the recipe, this lets you iterate on something a few elements at a time, but no elements are overlapping:
def chunked(iterable, n, fillvalue=None):
args = [iter(iterable)] * n
return zip(*args)
E.G: you have a list of students, and you want to randomly pair them with each other:
>>> students = [
... "Marie",
... "Pierre",
... "Jean",
... "Anne",
... "Paul",
... "Catherine",
... "François",
... ]
...
... import random
...
... random.shuffle(students)
... for student1, student2 in chunked(students, 2):
... print(f'- {student1} with {student2}')
...
- Pierre with François
- Marie with Catherine
- Jean with Anne
The recipe has been updated to use more variants of zip()
that this one now, but I still have the old one on speed dial because it's 3 lines of code.
Comprehension with all() / any()
So few people know we have all()
and any()
as built-in functions. It's there, you just have to type it!
The first one checks if all elements are True
in an iterable. It's like a super and
.
The second, you guessed it, check if at least one is True
, like a super or
.
Their coolness is that they work with any iterable, which mean you can apply them to a generator.
E.G, does this file contains a commented line?
>>> if any(line.startswith('#') for line in open('/etc/fstab')):
... print("yep")
...
yep