Python Xmas list

print("Jingle \a, jingle \a...")

Jun 20, 2023

Summary

If you code long enough with any language, there are things you will eventually wish to tweak. As of 2023, those are mine:

Adding ? as a KeyError, IndexError and AttributeError operator.
Give more visibility to partial()
Kill implicit string concatenation
Make iter[] and callable[] type hint generics
Introduce a Path literal
Adding a .get() method to list
Providing unpacking for dicts
Making itertools slice, takewhile and dropwhile syntax.
Add lazy imports and calculations
Provide a mechanism for export in modules
Turn the generator declaration into something more explicit
Add exception declaration to type hints

One can always dream

June might be a bit early for making a list of things I wish to get into Python one day, but to my defense, I have been a very good boy.

Please note that this article will have little influence on anything, and that it's mostly cathartic. I update it in my head every 3 months or so, sometimes you just gotta write it down. It's also an excuse to review features and patterns and share them with you. I'll use all the tricks to motivate you to come back reading.

But all of them are a possibility, it's not something that would change the language to the point of destroying its philosophy or style. I'm not talking about multi-line lambdas or copy-on-write immutable data structures here. Maybe breaking compat. Just a little bit.

?

Yes, that's the title.

I wish for an ? operator. And the complementary ?. and ?[ that come with it.

Indeed, in many languages, ? allows me to say "if something is null or unset, use this other thing".

E.G, in JS, you can do:

const count = start ?? 5;

If will set count to start if it exists, or use 5 if it doesn't. Unlike ||, it will work with start being 0 as well.

But the best feature is the null handling when traversing data structures like:

some.deeply?.nested?.thing() ?? "default value"

This will call thing() only if the whole chain exists, or use the default value. No need to handle errors.

Now in Python, None handling would not be as useful as having ?[ catch KeyError and IndexError while ?. would catch AttributeError:

some?.deeply?["thing"] ?? "default value"

And that would be super sweet.

Give more visibility to `partial()`

I like functools.partial(). It's a nice, explicit way to prefill arguments for a function:

>>> from functools import partial
>>> from_hex = partial(int, base=16) # prefill int "base" param to be 16
>>> from_hex("FA") # now I got a new hexadecimal converter function
250

It's more explicit than lambda, you see partial(), you know what's going on. Plus the repr() tells you what happens:

>>> from_hex
functools.partial(<class 'int'>, base=16)

Not to mention it's faster than a lambda.

But partial() it is not very popular in the community, in fact, a lot of devs don't even know about it.

That's because it's hidden behind an import. To be honest, even I am too lazy to import it sometimes.

Santa, I wish for partial() to become a built-in.

And given it's unlikely because adding built-ins are a pain in the butt for the core devs, at least we could make it a method on all callables.

That would look so nice:

>>> from_hex = int.partial(base=16)

The language doesn't have currying, so that would compensate for that. And give visibility to a very nice feature.

Kill implicit concatenation

This is a sin:

>>>  "a" "b" == "ab"
True

It's handy for long strings, I get it. I use it:

>>> ("This is long"
... " yet on one line")

But it's not worth the bugs people introduce by mistake. Every year I get one.

If you have the same problem, pylint can help, because PEP 3126 couldn't.

Make iter[] and callable[] a thing

When list, tuple and dict became capable of expressing generics in 3.9, we moved from the horror show of:

from typing import List, Tuple, Optional
...
Optional[Union[List[int], Tuple[int]]]
...

To:

...
list[int] | tuple[int] | None
...

And that made typing suddenly bearable.

But there are two things that are still common and a pain to declare: iterables and callables.

Having to import them every time is a chore:

from typing import Iterable, Callable
def bullets_points(elements: Iterable[str], formattter: Callable[[str, str], str]):
    ...

Since we have already iter() and callable() as built-in, I would love to be able to do:

def bullets_points(elements: iter[str], formattter: callable[[str, str], str]):
    ...

Path literal

In the last few years, there has been a great focus on improving the Python industrialization story: adding an event loop, type hints, improving speed, etc.

Of course, there is a limited budget of time, skills and resources, so this means the scripting story of Python has not moved as much as it used to.

pathlib and f-string were the last great scripting goodies that were added to Python, and it was in 3.6.

pathlib.Path suffers from the import syndrome: because you have to explicitly request it, and instantiate the class, it's not as convenient as it could be.

This is where a literal syntax for it would come in handy. Instead of having:

>>> from pathlib import Path
>>> ROOT / Path('project/templates')

You could simply do:

>>> ROOT / p'project/templates'

(or pb'project/templates' for raw bytes paths)

In the context of a shell session or a small file, it's a much better proposition.

This is why I have a hack dedicated to this in PYTHONSTARTUP.

list.get()

You want a key in a dict and you don't know if it's there? It's as easy as dict.get("key", "default value"). We talked about it last week.

You want the first element of a list and if it's empty get a default value?

That will be a ternary, sir:

first_element = my_list[0] if my_list else default_value

Or if you feel fancy, some unpacking wizardry, a iter() + next() call, a play with or, etc. It's not standardized, you are on your own. Hey, even list.pop() doesn't allow a default value!

Having a list.get(0, default) would help with that.

Unpacking for dicts

We talked about this before, but extracting variables from a dictionary is the opposite of Batman in Python.

Since I do a lot of web dev, it saddens me to find that for things like this, JS has a better story:

> fruits = {apple: 1, pear: 2, coconut: 3}
> {apple=0, kiwi=3} = fruits
> apple
1
> kiwi
3

If you want to do the same thing in Python:

>>> fruits = {"apple": 1, "pear": 2, "coconut": 3}
>>> apple, kiwi = fruits.get('apple', 0), fruits.get('kiwi', 3)
>>> apple
1
>>> kiwi
3

On top of that, JS objects being mappings, destructuring doubles as a way to extract a lot of attributes, which again, we don't have in Python.

Some itertools features in slicing

Slices work for lists and tuples, but you want to get a subset of any iterable, you have to use itertools.islice(). This means it's just easier to do this:

>>> colors = list(generate_colors())[:10] # assuming generate_color is a generator

which is a waste of CPU and memory, than the right thing:

>>> from itertools import islice
>>> colors = islice(generate_colors(), 0, 10)

Because generators are consumed when they are read, years ago the debate settled on the principle of least surprise, and not offering this:

>>> colors = generate_colors()[:10]

Personally, I still think this would be worth it.

In the same vein, if you want to get a subset of an iterable based on a starting or ending condition, you have to use itertools.takewhile() and itertools.dropwhile(). Not only people don't know about them, but they are quite counterintuitive, because they work in the opposite way of how people think of starting and ending conditions.

E.G, you have a data set, and you want to start reading data when the signal is getting bigger than 200, and stop when it drops bellow 100.

That would be:

>>> from itertools import takewhile, dropwhile
>>> def signal_too_low(value): return value <= 200
>>> def signal_high_enough(value): return value >= 100
>>> signals = [100, 150, 201, 175, 235, 99, 50]
>>> signals = dropwhile(signal_too_low, signals)
>>> signals = takewhile(signal_high_enough, signals)
>>> list(signals)
[201, 175, 235]

I don't know about you, but I don't find that easy to write, nor to read.

But if slicing were to allow callable, then we could write:

>>> def signal_above_200(value): return value > 200
>>> def signal_bellow_100(value): return value < 100
>>> signals = [100, 150, 201, 175, 235, 99, 50]
>>> signals = signals[signal_above_200: signal_bellow_100]
>>> list(signals)
[201, 175, 235]

I much prefer this version.

Lazy imports and calculations

Believe it or not, lazy imports, the concept of importing a module when it's first used, and not when it's imported, have been considered recently, and rejected. I do think they will be back, though, because Python startup time is something people are concerned about these days, and some of it is due to imports taking time even when you don't use them.

Of course, it's hard to get the semantics right on such a complex feature, and it raises the question: what about a generic lazy calculation mechanism? Wouldn't implementing the feature just for imports be half-baked?

Indeed, that would be useful for other areas in Python, like mutable default arguments:

def the_typical_python_trap(l=[])

This is usually a bug (although it can be used for caching), and the safe version would be:

def the_typical_python_trap(l=None)
    l = l or []

Yet with lazy it would give us something like:

def the_typical_python_trap(l=lazy [])

lazy being a totally imaginary implementation.

There is, of course, a different PEP to solve this, but a generic solution is appealing.

It's even more visible with dataclasses:

@dataclass
class User:
    friends: list[int] = field(default_factory=list)

would become:

@dataclass
class User:
    friends: list[int] = lazy []

Haskell devs are probably having a laugh right now.

Explicit export in modules

While I don't like having to export things from modules, I wish I could, like with type hints, opt-in to do so. The current way to define a public API in Python is to prefix any private thing with _. This means everything is public, unless you say so.

This works fine for a lot of projects, but if you make a library or a framework, you usually want the opposite: everything is private unless you say so. Because exposing something as public is a commitment to support it in the future.

We do have the magic __all__ variable, but it's limited to restrict what you can import with *, which is something you should rarely do, so it's not helping much.

It's not easy to define a clean system to allow this, though, but __all__ has the right idea. An __export__ variable with the same syntax, but actually white listing things you can import from the current module would be quite an improvement.

A better def for yield

Beginners have a hard time understanding yield. One of the reasons is that a generator has the same syntax as a regular function.

This is a function:

def reticulating():
    return splines()

This is a generator factory:

def reticulating():
    yield splines()

The syntax looks the same, but the those two codes don't represent the same type of objects at all. The presence of yield changes the nature of the whole block, which is not obvious.

When adding syntax for coroutines, the decision was much better:

async def reticulating():
    await splines()

Because the function starts with async def and not just def, it's very clear something special is going on.

I wish we would have that for generators as well. Something like:

gen def reticulating():
    yield splines()

Exception declaration

The main reason a lot of Python coders use very broad try/except is because it's hard to figure out what exact Exception subclass to catch.

Consider this:

try:
    process_result(requests.get("https://bitecode.dev"))
except as ex:
    print('The request has failed:', ex)

Yes, it's bad practice, because it will catch way too many types of exceptions, including the ones raised by process_result() or bugs in requests that are unrelated to the network errors you may get. But worse, it will use the same handling for all of them. You probably don’t want to deal with a timeout in the same way you handle a redirection loop.

But how do you figure out which exception to catch? Even the API reference doesn't include the answer.

And once you find the answer, there are many of them: Timeout, ConnectionError, HTTPError, TooManyRedirects and InvalidURL.

Depending on your process, you might want to catch only some of them. In my snippet, I hard code the URL, so InvalidURL is not useful to catch

Like with type hints, I wish we had optional annotations to declare what kind of exception some code can raise, so that if a library provides it, your IDE can list them, and even offer to generate the try/except block.

The benefit of Java clear error chains, without being required to catch them.

This would also encourage library docs to list them, and ease the burden of doing so since bubbling exceptions would be included automatically, something sphinx doesn't do with docstring declarations.

Finding the syntax for that would be a challenge, though. Python function signatures are already quite packed.

def fetch(
    key
) -> (
    int
    or raise
    KeyError,
    IndexError,
):

Is explicit but really ugly.

NMS

Jul 12, 2023

Great article. I can definitely relate to making iter[] and callable[] type hint generics. Have you considered submitting that as an extension to PEP 585? Similarly, I do feel a Path literal is a fantastic idea and could be submitted as a PEP.

W.r.t a better def for yield, yield from would match the equivalent functionality of return splines() if I'm not mistaken.

Expand full comment

1 reply by Bite Code!

1 more comment...

Bite code!

Discussion about this post