Summary
You can merge dict with
dict.updates()
,**
,|
andcollections.ChainMap
Use
dict.get()
,dict.setdefault()
,collections.defaultdict
and__missing__
to deal with missing values.operator.itemgetter
andmatch/case
will help you extract several values as once.Dict comprehensions and
dict.fromkeys()
are excellent tools to initialize mappings.Keys are not limited to strings, and
dict.keys()
allow set-like behavior.Dicts make great registries. Store callables in them.
We can abuse
types.MappingProxyType
to make a dict immutable.
Merging dict
Merging dicts should not be a complicated issue. You take 2 dicts, you get one, simple.
But while you can add two lists in Python, you cannot add two dicts.
For a long time, the only obvious way was the .update()
method:
>>> alloy = {"Vibranium": "3g", "Star Metal Ore": "2g"}
>>> alloy.update({"Vibranium": "2g", "Sonium": "1g"})
>>> alloy
{'Vibranium': '2g', 'Star Metal Ore': '2g', 'Sonium': '1g'}
Because the method was mutating the dict in place, if you wanted to not affect any existing data, you had to create one empty dict, and update it twice:
>>> alloy = {"Vibranium": "3g", "Star Metal Ore": "2g"}
>>> new_allow = {}
>>> new_allow.update(alloy)
>>> new_allow.update({"Vibranium": "2g", "Sonium": "1g"})
>>> new_allow
{'Vibranium': '2g', 'Star Metal Ore': '2g', 'Sonium': '1g'}
That is, until PEP 448 int Python 3.5 so we could finally express it with unpacking:
>>> alloy = {"Vibranium": "3g", "Star Metal Ore": "2g"}
>>> new_allow = {**alloy, "Vibranium": "2g", "Sonium": "1g"}
>>> new_allow
{'Vibranium': '2g', 'Star Metal Ore': '2g', 'Sonium': '1g'}
It's quite handy, but:
You still don't have a symmetry with lists, tuples and sets that can use a simple operator.
You cannot use the same system to update the dict in place.
It's easy to confuse
*
with**
, both work with dicts, but the former will only spread keys in an iterable.You can't use a dunder method to intercept this operation.
For all those reasons, PEP 584 added a new trick to Python 3.9. You can now use the |
operator to merge dicts like we can already merge sets:
>>> alloy = {"Vibranium": "3g", "Star Metal Ore": "2g"}
>>> new_allow = alloy | {"Vibranium": "2g", "Sonium": "1g"}
>>> new_allow
{'Vibranium': '2g', 'Star Metal Ore': '2g', 'Sonium': '1g'}
>>> new_allow |= {"Adamantium": "10g"}
>>> new_allow
{'Vibranium': '2g', 'Star Metal Ore': '2g', 'Sonium': '1g', 'Adamantium': '10g'}
One lesser-known tool is ChainMap, added in 3.3, that will take several dictionaries, and attempt to read a key from the first one, then if it's missing the second one, and so one. Any new key is only added to the first one:
>>> from collections import ChainMap
>>> first = {}
>>> second = {"Vibranium": "3g", "Star Metal Ore": "2g"}
>>> third = {"Vibranium": "2g", "Sonium": "1g"}
>>> alloy = ChainMap(first, second, third)
>>> alloy
ChainMap({}, {'Vibranium': '3g', 'Star Metal Ore': '2g'}, {'Vibranium': '2g', 'Sonium': '1g'})
>>> alloy['Vibranium'] # priority matters
'3g'
>>> alloy['Trinium'] = "4g"
>>> alloy
ChainMap({'Trinium': '4g'}, {'Vibranium': '3g', 'Star Metal Ore': '2g'}, {'Vibranium': '2g', 'Sonium': '1g'})
>>> first # dicts are not copied, but referenced
{'Trinium': '4g'},
Useful tool if you can't afford to copy mappings, mourn to contain updates or just want to easily look up into several data sources.
Getting one value out
Looking up for something in a dict is quite straight forward:
>>> alloy = {'Vibranium': '2g', 'Star Metal Ore': '2g', 'Sonium': '1g', 'Adamantium': '10g'}
>>> alloy['Sonium']
'1g'
The problem occurs when you actually want a value that may not be here.
While you can always catch KeyError
or just use the in
operator:
>>> try:
... alloy['Mithril']
... except KeyError:
... print('Ishkhaqwi ai durugnul!')
Ishkhaqwi ai durugnul!
>>> if "Mitrhil" in alloy:
... print('Mellon')
...
But more often than not, you want a default value if the key is missing, and this is what the .get()
method gives you:
>>> alloy.get('Vibranium', "7g")
'3g'
>>> alloy.get('Mythril', "7g")
'7g'
A bit less known is the .setdefault()
method, that does like .get()
but also insert the value if it's missing:
>>> alloy
{'Vibranium': '2g', 'Star Metal Ore': '2g', 'Sonium': '1g', 'Adamantium': '10g', 'Mythril': '7g'}
collections.defaultdict
generalize this by accepting a function as a factory to create any missing value:
>>> from collections import defaultdict
>>> from datetime import datetime
>>> last_access = defaultdict(datetime.now) # on missing key, call datetime.now()
>>> last_access['user378']
datetime.datetime(2023, 6, 13, 8, 32, 9, 132303)
>>> last_access
defaultdict(<built-in method now of type object at 0x97d9e0>, {'user378': datetime.datetime(2023, 6, 13, 8, 32, 9, 132303)})
This method is handy, but doesn't give the factory function access to the key. The alternative is to use the __missing__
dunder method on a custom dict type:
>>> class Words(dict):
... # if the key doesn't exist, this is called to get the value
... def __missing__(self, key):
... # you can create the entry in the dict if you want,
... # but it's not done by default
... self[key] = len(key)
... return len(key)
...
>>> stats = Words()
>>> stats['supercalifragilisticexpidelilicious']
35
>>> stats
{'supercalifragilisticexpidelilicious': 35}
This is used to great effect by collections.Counter
:
>>> from collections import Counter
>>> count = Counter()
>>> count['a']
0
>>> count['b']
0
>>> count['c'] += 1
>>> count
Counter({'c': 1})
>>> count.update('fjdkslqmfjsdqlmkfds')
>>> count
Counter({'c': 1, 'f': 3, 'j': 2, 'd': 3, 'k': 2, 's': 3, 'l': 2, 'q': 2, 'm': 2})
>>> count.most_common(3)
[('f', 3), ('d', 3), ('s', 3)]
Getting several values out
Sure, you can always loop:
>>> alloy = {'Vibranium': '2g', 'Star Metal Ore': '2g', 'Sonium': '1g', 'Adamantium': '10g', 'Mythril': '7g'}
>>> [alloy[key] for key in ('Sonium', "Adamantium")]
['1g', '10g']
But there are more declarative ways to do it.
First, we have operator.itemgetter
if you need a callable:
>>> from operator import itemgetter
>>> get_quantity = itemgetter('Sonium', 'Adamantium')
>>> get_quantity(alloy)
('1g', '10g')
Second, if you are rooting for a more syntax oriented tool (or if you need matching as well), 3.10 now offers match/case
as an alternative:
>>> match alloy:
... case {"Sonium": sonium, "Adamantium": adamantium}:
... print(sonium, adamantium)
...
1g 10g
There are not a lot of things that I envy from Javascript, but destructuring is one of them. The fact that its not only more verbose in Python to do it but also that you don't have an easy way to request a default value is too bad.
This also makes it needinglessly hard to get a subset of a dictionary.
Creating a dict
While literals are the default mode of creating dicts, surprisingly there are still rogue agents in the community that prefers the function syntax:
>>> dict(Sonium="1g", Adamantium="3g")
{'Sonium': '1g', 'Adamantium': '3g'}
It is indeed a bit easier to type on a French keyboard layout where {}
are not the most accessible. But you can't use this syntax for 'Star Metal Ore'
because of the spaces.
It's still very useful for creating a dict out of an iterable of tuple, which was the default way of dynamically creating dict before comprehensions arised:
>>> dict([('Vibranium', '2g'), ('Star Metal Ore', '2g'), ('Sonium', '1g'), ('Adamantium', '10g'), ('Mythril', '7g')])
{'Vibranium': '2g', 'Star Metal Ore': '2g', 'Sonium': '1g', 'Adamantium': '10g', 'Mythril': '7g'}
And it's still very useful because of tools like zip()
or enumerate()
that can return exactly that:
>>> metal
('Vibranium', 'Star Metal Ore', 'Sonium', 'Adamantium', 'Mythril')
>>> quantity
('2g', '2g', '1g', '10g', '7g')
>>> dict(zip(metal, quantity))
{'Vibranium': '2g', 'Star Metal Ore': '2g', 'Sonium': '1g', 'Adamantium': '10g', 'Mythril': '7g'}
However, dict comprehensions are kinda awesome:
>>> metal
('Vibranium', 'Star Metal Ore', 'Sonium', 'Adamantium', 'Mythril')
>>> {m: f"{len(m)}g" for m in metal}
{'Vibranium': '9g', 'Star Metal Ore': '14g', 'Sonium': '6g', 'Adamantium': '10g', 'Mythril': '7g'}
Despite that awesomeness, I must admit my favorite underused feature remains dict.fromkeys()
, which will create a dict out of any iterable. It's so versatile:
>>> dict.fromkeys('fjdskqlfd')
{'f': None, 'j': None, 'd': None, 's': None, 'k': None, 'q': None, 'l': None}
>>> dict.fromkeys(metal, '0g')
{'Vibranium': '0g', 'Star Metal Ore': '0g', 'Sonium': '0g', 'Adamantium': '0g', 'Mythril': '0g'}
Playing with keys
Python dicts have one unusual characteristics: keys can be anything hashable, not just strings.
This means keys can be any arbitrary object as long as it defines __hash__
, which all immutable objects in Python do: bytes, numbers, booleans, tuples, frozensets...
Using coordinates as keys is quite handy:
>>> coords = {}
>>> coords[(3, 1)] = "blue"
>>> coords[(2, 2)] = "red"
>>> coords
{(3, 1): 'blue', (2, 2): 'red'}
This is particularly useful during the advent of code. Unlike arrays, using a dict doesn't require to allocate all existing points, values can start at 1000000 without having to deal manually with the offset, it makes it easy to de-duplicate similar combinations and it's so simple to sort by group of values.
When dealing with numbers as keys though, remember:
Pre-allocated lists are very fast. Dicts hash every key.
True == 1
,False == 0
:
>>> {1: "1", True: "True"}
{1: 'True'}
One more thing: dict.keys()
return a dict_keys
object which:
Doesn't copy the dict, so it takes little memory, reflect the freshness of the values, and is fast.
Allows some set-like operations:
>>> alloy = {'Vibranium': '0g', 'Star Metal Ore': '0g', 'Sonium': '0g', 'Adamantium': '0g', 'Mythril': '0g'}
>>> keys = alloy.keys()
>>> keys & ["Vibranium", "Redstone"]
{'Vibranium'}
>>> keys | ["Vibranium", "Redstone"]
{'Redstone', 'Mythril', 'Vibranium', 'Sonium', 'Adamantium', 'Star Metal Ore'}
>>> keys ^ ["Vibranium", "Redstone"]
{'Redstone', 'Mythril', 'Sonium', 'Adamantium', 'Star Metal Ore'}
And if you don't know what the heck is going on here, check out Python sets, they are really nice.
Dicts are great registries
One of my favorite underused patterns for dicts is the fact they can hold reference to callables, or collection of callables, making them great registries.
Whether you want a reusable switch:
>>> codes_handlers = {200: lambda: print('yeah'), 404: lambda: print('where did it go?'), 500: lambda: print('Sorry :(')}
>>> codes_handlers[200]()
yeah
Or some kind of event system:
>>> listeners = {
... "on_save": [],
... "on_load": [],
... "on_send": []
... }
...
>>> listeners.setdefault("on_send", []).append(lambda: print('New article !'))
>>> listeners.setdefault("on_send", []).append(lambda: print('Publish on Hacker news'))
>>> for handler in listeners.get('on_send', []): # 2 hours later, you do this
... handler()
...
New article !
Publish on Hacker news
You can usually quickly hack something decent without immediately looking for the big guns.
Bonus stage: making a dict immutable
list
has tuple
, set
has frozenset
, but there is no equivalent for dictionaries.
Yet, we can create one with types.MappingProxyType
:
>>> alloy = {'Vibranium': '9g', 'Star Metal Ore': '14g', 'Sonium': '6g', 'Adamantium': '10g', 'Mythril': '7g'}
>>> from types import MappingProxyType
...
... forged_alloy = MappingProxyType(alloy)
>>> forged_alloy['Sonium']
'6g'
>>> forged_alloy['Sonium'] = "1g"
Traceback (most recent call last):
Cell In[39], line 1
forged_alloy['Sonium'] = "1g"
TypeError: 'mappingproxy' object does not support item assignment
That's not going to remotely make Python FP more practical, but it's a start.