Summary
Sorting any iterable in Python can be done with sorted()
, which will always return a new list without modifying the original object.
It uses so-called natural ordering”.
E.G., strings are ordered alphabetically:
>>> sorted(("cat", "dog", "fish", "bird"))
['bird', 'cat', 'dog', 'fish']
You can customize sorting by using reverse
to get the reverse order, or passing a callback to specify a custom ordering criteria.
E.G., sort using the number of letters, from more to less:
>>> sorted(("cat", "dog", "fish", "bird"), reverse=True, key=len)
['fish', 'bird', 'cat', 'dog']
Returning a tuple in the key function lets you define the priority for various criterias in case of equality.
E.G., sort by length, but in case of equality, sort alphabetically:
>>> sorted(("cat", "dog", "fish", "bird"), key=lambda e: (len(e), e))
['cat', 'dog', 'bird', 'fish']
The same mechanism can be used to sort a list in place with list.sort()
or finding boundaries with min()
and max()
.
Finally, if you really need some custom ordering, you can define the value of an object compared to another using the dunder methods __eq__
, __gt__
and __lt__
which will be called when using the operators ==
, >
and <
.
A logic of some sort
In Python, sorting things is mostly done using the sorted()
function. You can apply it to any iterable, and it will return a sorted list
>>> sorted("zag")
['a', 'g', 'z']
>>> sorted(("cat", "dog", "fish", "bird"))
['bird', 'cat', 'dog', 'fish']
The original iterable is not modified, sorted()
always return a new object.
The elements of the iterable are sorted using a so-called "natural ordering":
strings are sorted alphabetically;
numbers are sorted numerically;
lists and tuples are compared one element at a time;
sets check if one contains the other;
dictionaries can't be compared.
You cannot sort things that can't be compared:
>>> sorted((1, "2"))
TypeError: '<' not supported between instances of 'str' and 'int'
>>> sorted(({}, {}))
TypeError: '<' not supported between instances of 'dict' and 'dict'
>>> sorted(([], ()))
TypeError: '<' not supported between instances of 'list' and 'tuple'
reverse
tells sorted()
to use the reverse order.
E.G., if we sort numbers, use descending order:
>>> sorted([2, 4, 1])
[1, 2, 4]
>>> sorted([2, 4, 1], reverse=True)
[4, 2, 1]
The key to understand custom sorting
While this system is convenient for very simple cases, real-life situations often require to choose how to sort your iterable.
In many languages, this is done by providing a function that must return 0 for equality, a positive number if the first element is the greater or a negative one if the second element is greater.
E.G., in javascript, we have a list to sort by the number of characters in the word:
> const hellos = [
... { language: 'Spanish', word: 'hola' },
... { language: 'Italian', word: 'ciao' },
... { language: 'French', word: 'bonjour' },
... { language: 'Gerudo', word: "sav'aaq:" },
... ];
> // subtracting the length gives us 0 if they are equal, a positive
> // number if hello1 is longer, and a negative one if hello2 is longer
> function compareElements(hello1, hello2){
... return hello1.word.length - hello2.word.length;
... }
> hellos.sort(compareElements);
[
{ language: 'Spanish', word: 'hola' },
{ language: 'Italian', word: 'ciao' },
{ language: 'French', word: 'bonjour' },
{ language: 'Gerudo', word: "sav'aaq" }
]
> for (const hello of hellos) {
... console.log(`${hello.language}: ${hello.word}`);
... }
Spanish: hola
Italian: ciao
French: bonjour
Gerudo: sav'aaq
In Python, we also provide a function. But the function doesn't compare 2 elements. It takes a single element, and returns the part of this element we want as a sorting criteria. This criteria will then be compared using “natural ordering”.
E.G.:
>>> hellos = [
... { "language": 'Spanish', "word": 'hola' },
... { "language": 'Italian', "word": 'ciao' },
... { "language": 'French', "word": 'bonjour' },
... { "language": 'Gerudo', "word": "sav'aaq:" },
... ]
>>> # Since we want to sort on the size of each word, we simply return
>>> # the size of the word
>>> def this_is_what_to_order_on(one_hello):
... return len(one_hello['word'])
>>> # We give this function to sorted() using the "key" parameter
>>> # Note we don't add parenthesis after this_is_what_to_order_on
>>> # since we pass the function itself, not the result of the function.
>>> sorted(hellos, key=this_is_what_to_order_on)
[
{ language: 'Spanish', word: 'hola' },
{ language: 'Italian', word: 'ciao' },
{ language: 'French', word: 'bonjour' },
{ language: 'Gerudo', word: "sav'aaq" }
]
>>> for hello in hellos:
... print(f"{hello['language']}: {hello['word']}")
...
Spanish: hola
Italian: ciao
French: bonjour
Gerudo: sav'aaq
Normally, sorted()
can’t order an iterable of dicts. But because we pass a key function, we tell it how to sort them. sorted()
will call this_is_what_to_order_on()
on every element and deduct the order using the "natural ordering" or what the function returns, here the lentgh of the word.
It’s important to understand that sorted()
only uses the key function to know what's the value of the current element in the hierarchy. It does no transform the elements. The order of the elements will change, but the elements themselves won't change.
Have you ordered more functions?
While sorted()
is the main way to sort things in Python, there are more options.
All of them accept the key
parameter to pass a custom ordering logic.
Sorting in place
list.sort()
sort a list in place, meaning that unlike sorted()
, it will modify the original object:
>>> l = [2, 1, 3]
>>> l.sort()
>>> l
[1, 2, 3]
Be careful with it, it returns None
. It's a mistake beginners often do:
>>> l = l.sort()
>>> type(l) # your list has been destroyed
<class 'NoneType'>
list.sort()
is less convenient than sorted because it only works on a list, and modifies it. For this price, it is faster, and consumes less memory.
Finding boundaries
Sometimes you don't want to order the whole thing, just find the biggest or smallest value. min()
and max()
will help with that:
>>> min((4, 1, 3))
1
>>> max(hellos, key=this_is_what_to_order_on)
{'language': 'Gerudo', 'word': "sav'aaq:"}
A quality order is about the delivery
You don't always have to code long functions to pass as keys.
You can use built-in ones:
>>> animals = ['bird', 'cat', 'dog', 'fish']
>>> sorted(animals, key=len)
['cat', 'dog', 'bird', 'fish']
>>> batiments = ["080", "001", "10"]
>>> sorted(batiments)
['001', '080', '10']
>>> sorted(batiments, key=int)
['001', '10', '080']
>>> sorted([-10, 0, 1, -2, 11])
[-10, -2, 0, 1, 11]
>>> sorted([-10, 0, 1, -2, 11], key=abs)
[0, 1, -2, -10, 11]
Another trick is the fact any method in Python accept self
as first argument. So if you call it from a class, and not an object, you can use it like a function!
E.G., you can call str.lower()
like a method or a function:
>>> "BITE CODE!".lower()
'bite code!'
>>> str.lower("BITE CODE!")
'bite code!'
This can be used at our advantage in a sort function.
E.G., case insensitive sorting:
>>> sorted(["Z", "A", "a", "b", "B", "a"], key=str.lower)
['A', 'a', 'a', 'b', 'B', 'Z']
Of course, you can use lambda
to make things shorter and more to the point.
Instead of:
>>> def this_is_what_to_order_on(one_hello):
... return len(one_hello['word'])
>>> sorted(hellos, key=this_is_what_to_order_on)
You can reduce it to:
>>> sorted(hellos, key=lambda one_hello: len(one_hello['word']))
It is even acceptable to use a variable e
for "element" in that case because it's so common:
>>> sorted(hellos, key=lambda e: len(e['word']))
And if lambda
works, functools.partial works as well.
One last tip is to use the operator module. It it more verbose, and it does require an import, but you will gain a bit of speed.
Let's say we want to sort movies by release_year:
movies = [
{
"title": "Sharknado",
"release_year": 2013,
},
{
"title": "Sharknado 2: The Second One",
"release_year": 2014,
},
{
"title": "Sharknado 3: Oh Hell No!",
"release_year": 2015,
},
{
"title": "Sharknado: The 4th Awakens",
"release_year": 2016,
},
{
"title": "Sharknado 5: Global Swarming",
"release_year": 2017,
}
]
for movie in sorted(movies, key=lambda e: e["release_year"]):
print(movie['title'])
Sharknado
Sharknado 2: The Second One
Sharknado 3: Oh Hell No!
Sharknado: The 4th Awakens
Sharknado 5: Global Swarming
You can do the same with operator.itemgetter
:
>>> from operator import itemgetter
...
... for movie in sorted(movies, key=itemgetter("release_year")):
... print(movie['title'])
...
And it's a tad faster:
>>> %timeit sorted(movies * 10000, key=itemgetter("release_year"))
2.28 ms ± 47.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit sorted(movies * 10000, key=lambda e: e["release_year"])
2.83 ms ± 27.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Sorting out equality
If I generate a bunch of totally random kids, I may end up in a pickle for sorting them by age:
import random
>>> kids = [
... {"name": "Stan", "age": random.randint(8, 10)},
... {"name": "Kyle", "age": random.randint(8, 10)},
... {"name": "Cartman", "age": random.randint(8, 10)},
... {"name": "Kenny", "age": random.randint(8, 10)}
... ]
...
>>>
>>> kids
[{'name': 'Stan', 'age': 9}, {'name': 'Kyle', 'age': 8}, {'name': 'Cartman', 'age': 10}, {'name': 'Kenny', 'age': 9}]
I have 2 kids with the same age, how to know in which order they will be?
Well, when Python compares lists or tuples, it compares them element by element:
>>> [1, 2] > [1, 1]
True
>>> [1, 2] > [1, 3]
False
There, the first elements are equal, so Python goes to the second ones, and compare that.
We can use that in sorting to our advantage, by making the key function return a tuple.
>>> def sort_on_age_then_name(kid):
... return (kid['age'], kid['name'])
...
... for kid in sorted(kids, key=sort_on_age_then_name):
... print(kid["name"], ":", kid["age"])
...
Kyle : 8
Kenny : 9
Stan : 9
Cartman : 10
Here the tuple contains the age as the first element, and the name as second. So Python will sort numerically on the first element, and if it encounters an equality, it will go to the second element, and compare alphabetically.
You can make it infinitely complex.
E.G., sort on the sum of the age and the length of the name, plus in case of equality, sort on the age, and then in case of equality again, sort on the name:
def custom_sort(kid):
return (kid['age'] + len(kid['name']), kid['age'], kid['name'])
Defying order
You can rebel and tell Python that you know better, and that you will now decide what's the real value of an object.
E.G., here is a class to represent a country food ranking, perfectly objectively:
class Country:
def __init__(self, name, michelin_stars):
self.name = name
self.michelin_stars = michelin_stars
# Two countries are equal if the name is
def __eq__(self, other):
return self.name == other.name
# If objects are compared using > or <, we compare the number
# of stars. If it's France, it obviously always wins, no
# matter the starts as the laws of the universe explicitly
# require it
def __lt__(self, other):
# Is the current object lower than the other object?
if self.name == "France":
return False
if other.name == "France":
return True
return self.michelin_stars < other.michelin_stars
def __gt__(self, other):
# Is the current object greater than the other object?
if self.name == "France":
return True
if other.name == "France":
return False
return self.michelin_stars > other.michelin_stars
def __repr__(self):
return f"Country({self.name}, {self.michelin_stars})"
sorted()
, max()
and min()
will use those methods to define the order:
>>> france = Country("France", 628)
... japan = Country("Japan", 519)
... usa = Country("United States", 171)
... spain = Country("Spain", 204)
... germany = Country("Germany", 309)
...
... print(max([france, japan, usa, spain, germany]))
...
Country("France", 628)
>>> france = Country("France", 1)
>>> print(max([france, japan, usa, spain, germany]))
Country(France, 1)
All hail to our AI overlord
Those kinds of short snippets you never remember are typically the type of things large language models are very good at.
Years ago, we had cheat sheets for them, and we copy/paste recipes, or googled our way to stackoverflow.
But now that you understand the general idea behind sorting, no need to keep it in your memory or at hand.
Next time you want to sort stuff, just ask ChatGPT or codepilot, and they will likely give you exactly what you need, in way less time than searching for it.
Just make sure you understand this article, and you will be able to get the code it gives you and insert it correctly.
This is the beauty of those new tools: you can focus on understanding, not on overloading your brain file system. It will come with repetition anyway.
The Javascript example with a comparison function could also be transcribed to Python in the same fashion by using the cmp_to_key() function from the functools module (https://docs.python.org/3/library/functools.html#functools.cmp_to_key , also https://stackoverflow.com/questions/5213033/sort-a-list-of-lists-with-a-custom-compare-function).