Summary
Python iteration is made simple thanks to the inclusion of the iterator design pattern, available through the for
loop:
>>> for n in (1, 2, 3):
... print(n)
1
2
3
Behind the scenes, a for
loop is a really a while
plus a try
/except
in a trench coat:
>>> iterator = iter((1, 2, 3))
>>> while True:
... try:
... print(next(iterator))
... except StopIteration:
... break
...
1
2
3
Because the iterator object ensures a universal interface across all iterable objects, you can "get the next element" in the same way for a list
, a tuple
, a dict
or a file. Hence, you can use them with any function accepting an iterable as input, any comprehension, and even unpacking.
This allows for a coding style in Python where you start considering iterables as pipes that you plug to each other, and watch the data flow:
>>> animals = ['bird', 'cat', 'dog', 'fish'] # iterable
>>> noises = ["chip", "mew", "bark", "?"] # iterable
>>> pairs = zip(animals, noises) # 2 iterables in, 1 out
>>> texts = map(" ".join, pairs) # 1 iterable in, 1 out
>>> print(*texts, sep="\n") # unpacking the last iterable as func params
bird chip
cat mew
dog bark
fish ?
Iterables, iterables everywhere!
Keeping in mind I want to eventually talk about mixing generators with context managers, we will now move to the iteration part. As before, we'll first do an intro, then a deeper dive in the next article.
Now for the intro, we'll talk about iteration, meaning the process of asking for the next element again and again.
It Python, this is done with the for loop:
>>> for n in (1, 2, 3):
... print(n)
1
2
3
And you may know, iteration is everywhere in Python.
First, there are a lots of built-in data structures that you can use for
on: tuples, lists, sets, strings, dicts, files...
They are all what we call "iterables", meaning something you can put in a for
loop. It doesn't matter which one, it works the same way.
Tuple:
>>> for n in (1, 2, 3):
... print(n)
1
2
3
List:
>>> for n in [1, 2, 3]:
... print(n)
1
2
3
Dict:
>>> for n in {1: 1, 2: 2, 3: 3}:
... print(n)
1
2
3
String:
>>> from at
>>> for n in "123":
... print(n)
1
2
3
File:
>>> from pathlib import Path
>>> Path('numbers.txt').write_text('1\n2\n3\n') # create an example file
>>> for n in open('numbers.txt'):
... print(n)
1
2
3
Python doesn't care they are lists or tuples, what matters is that the for
loop can ask at each turn what is the next element.
This is a beautiful thing, it means as soon as you say "I want an iterable", you are suddenly compatible with all those things.
E.G, the function sum()
accept any iterable of int
. It doesn't care where they come from:
>>> sum((1, 2, 3)) # tuple
6
>>> sum({1, 2, 3}) # set
6
>>> sum([1, 2, 3]) # list
6
For sum()
, it's all the same.
Now, iteration is only about "getting the next element".
It's not about having elements at indexes.
E.G, files can't be indexed:
>>> open('numbers.txt')[0]
...
TypeError: '_io.TextIOWrapper' object is not subscriptable
It's not about having an order.
E.G, can't rely on set order:
>>> {1, "a", 1j, 3}
{1, 1j, 3, 'a'}
It's not about containing anything.
E.G, you can make a huuuuuuuuge range, but it doesn't eat all memory because it doesn't actually contain all those numbers:
>>> range(100000000000000000000000000000000000000000000000000000)
range(0, 100000000000000000000000000000000000000000000000000000)
And finally, it's not even about the elements staying around:
>>> steps = enumerate(["Collect underpants.", "?", "PROFIT"], 1)
>>> for num, step in steps:
... print(num, step)
...
1 Collect underpants.
2 ?
3 PROFIT
>>> list(steps) # steps is empty!
[]
So iteration is really about "getting the next element" and only about this.
But how does Python request the next element for so many different things?
The iterator protocol
A long time ago, in a galaxy far away, Python 2.1 (!) adopted PEP 234, making the iterator design pattern an official part of the language.
Yes, iterator was a design pattern. Most things in languages are formalized design patterns extracted from what has worked in the field for the previous generations of devs. Even the concept of programming function didn't exist at some point, but coders got into the habit of storing assembler registers in a certain way, and one day, language designers decided it was very convenient, and made it part of their new baby.
In the same way, people realized applying the same process to a lot of elements was really a common task, and so patterns like iterator and visitor got more and more used, until they were formalized.
The way Python implements the iterator pattern is by having each iterable object come with another object, called, you guest it, the iterator.
The iterator knows how to get the next element for this particular object. It is, indeed, specialized for this particular object. For example, a list iterator only knows how to get the next element out of a list. So the for
loop gets the iterator, and requests the next element again and again until there is no more.
This is how we can do it manually:
>>> iterable = (1, 2, 3)
>>> iterator = iter(iterable) # get the specialized iterator
>>> iterator
<tuple_iterator object at 0x7f4b96955d10>
>>> next(iterator) # ask the iterator for the next element
1
>>> next(iterator)
2
>>> next(iterator)
3
>>> next(iterator) # StopIteration exception is raised when no next element
...
StopIteration
You’ll notice that this works with any iterable. That's the point. It can be a tuple, a list, a set, it doesn't matter.
Let's do it with a file:
>>> from pathlib import Path
>>> Path('numbers.txt').write_text('1\n2\n3\n') # create an example file
>>> iterable = open('numbers.txt')
>>> iterator = iter(iterable) # get the specialized iterator
>>> iterator # this time, it's an iterator specialized in files
>>> print(next(iterator)) # ask the iterator for the next element
1
>>> print(next(iterator))
2
>>> print(next(iterator))
3
>>> print(next(iterator)) # StopIteration again after the last file line
...
StopIteration
This is universal, the iterator is the adapter that smooths the difference between all iterables.
It’s why all the things you can pass to a for
loop are all compatible.
Also, didn't you notice something funny?
We know the last element is reached if we try to get one more, and it raises an exception.
This means a for
loop, in Python, can be seen as a while
loop with a try
/except
.
>>> for x in (1, 2, 3):
... print(x)
...
1
2
3
Is basically under the hood:
>>> iterator = iter((1, 2, 3))
>>> while True:
... try:
... print(next(iterator))
... except StopIteration:
... break
...
1
2
3
That's Python dirty little secret!
Just give me an iterable
Because of this wonderful property, you can start accepting any iterable as input of your function. Remember in the article "Sculpting a function", we had something like this:
def iformat(elements):
for element in elements:
print("-", element)
You can format a list of numbers:
>>> iformat([1, 2, 3])
- 1
- 2
- 3
But because iteration is universal, as long as an object is iterable, this function will work:
>>> iformat((True, False, None))
- True
- False
- None
>>> iformat(open('numbers.txt'))
- 1
- 2
- 3
>>> iformat({"blue": 1, "green": 0})
- blue
- green
This offers a lot of flexibility in the design of your code, and of course, Python itself uses this flexibility.
Built-in functions like
enumerate()
, zip()
, sorted()
, max()
, map()
, all()
and set()
all work on most iterables. Some like sorted()
requires loading everything in memory, so they will not work on infinite iterables, but it's an edge case.
Just give them iterables:
>>> sorted({"cat", "dog", "fish", "bird"})
['bird', 'cat', 'dog', 'fish']
>>> sorted(["cat", "dog", "fish", "bird"])
['bird', 'cat', 'dog', 'fish']
>>> noises = ["chip", "mew", "bark", "?"]
>>> for animal, cry in zip(['bird', 'cat', 'dog', 'fish'], noises):
... print(animal, cry)
...
bird chip
cat mew
dog bark
fish ?
>>> set('abcabcabcabc')
{'b', 'a', 'c'}
It becomes interesting once you realize you can chain iterables:
>>> lines = open('numbers.txt') # get an iterable of lines
>>> numbers = map(int, lines) # turn it into an iterable of integers
>>> sum(numbers) # sum accepts an iterable of numbers
Functions that accept and return iterables are like pipes that you can connect to each other.
Unpacking
Remember unpacking? The feature that does this:
>>> a, b, c = numbers
>>> a
1
>>> b
2
>>> c
3
Well, it also works with any iterable:
>>> a, b, c = {1: 1, 2: 2, 3: 3}
>>> print(a)
1
>>> print(b)
2
>>> print(c)
3
>>> print(*open('numbers.txt'))
1
2
3
>>> a, *b, c = range(1, 1000)
>>> a
1
>>> c
999
And the pipe analogy? It works also with unpacking:
>>> animals = ['bird', 'cat', 'dog', 'fish'] # iterable
>>> noises = ["chip", "mew", "bark", "?"] # iterable
>>> pairs = zip(animals, noises) # 2 iterables in, 1 out
>>> texts = map(" ".join, pairs) # 1 iterable in, 1 out
>>> print(*texts, sep="\n") # unpacking the last iterable as func params
bird chip
cat mew
dog bark
fish ?
Of course, you have to find the right balance between something that is expressive, and easy to read and debug. So don't abuse this, but it is powerful.
Comprehensions
Now we learned that for
loop can be used on any iterable.
And you know what is also a for
loop? A comprehension:
>>> squares = [x * x for x in range(10)]
>>> squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
They are basically a shortcut to write this:
>>> squares = []
>>> for x in range(10):
... squares.append(x * x)
>>> squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
So just like the regular for
loop, they can get connected like pipes:
>>> numbers = map(int, open('numbers.txt'))
>>> squares = [x * x for x in numbers]
>>> squares
[1, 4, 9]
In fact, because they return an iterable (here a list), you can use them where an iterable is expected:
>>> sum([int(x) * int(x) for x in open('numbers.txt')])
14
With this in mind
If this was new to you, this introduction to iteration should change how you see and write Python code.
Being iterable is an implicit interface (so much for “explicit is better than implicit, hu?) that bring a whole universe of data structures that fit together: you can now picture many programs as a flow of data going through iterables that you connect to each other.
Funnily enough, this is a good step in the door for functional programming. But we are not going to address this for now.
We will double down on the pipe thingy and learn about generators, which is the topic of the next article.