Sculpting a Python function

Exchanging complexity for features

May 22, 2023

Summary

You can take a simple function such as:

def iformat(elements):
    for element in elements:
        print("-", element)

And incrementally work on it:

remove side effects;
add parameters;
use dependency injection;
document with docstrings;
add type hints.

When doing so, you trade a bit more of complexity for richness, you are free to put the cursor anywhere in the spectrum of those trade-offs.

From zero to hero?

In this article, we are going to take a very dumb Python formatting function, and work on it until it becomes very sophisticated.

For this case study, we will create a function that takes a list, and print the elements of this list:

def iformat(elements):
    for element in elements:
        print("-", element)

As basic as it goes:

>>> iformat([1, 2, 3])
- 1
- 2
- 3

This is our block of marble, now let's see what we can refine this into.

Remove the side effects

Programs exist only for the side effects. If there were none, they would not be very useful to us, because nothing they do would affect our world. E.G: we would not see the output.

But on the other hand, side effects are hard to reason about, and to test. They are also your connection with this nasty little thing we call reality, which is unpredictable.

That's why it's common to see programmers try to limit the size of their code base that has side effects, and move the side effects to the "border" of the program, while keeping as many functions as pure as possible.

Side effects include (but are not limited to): write to the disk, communicating with the network, printing on the terminal, modifying data that is out of scope, etc.

In our function, this is a rather simple case: we remove the part where it writes to the terminal. This becomes:

def iformat(elements):
    output = []
    for element in elements:
        output.append(f"- {element}")
    return "\n".join(output)

This does make the implementation more complicated, because we don't have the terminal to buffer the strings for us anymore. Also, it implies the calling code now has to print:

>>> print(iformat([1, 2, 3]))
- 1
- 2
- 3

In exchange for those compromises, we gained the following benefits:

The function cannot raise any encoding error, system errors, etc. anymore. It's very predictable.
It's easy to test: compare the output to a string, and voilà.
You can use it to output the result in a file or a database, we are not tied to a specific input/outout anymore.

Adding formatting parameters

This function is fine as long as you only want to format exactly one element per line, with "-" as a bullet. It may fit exactly our use case.

However, if we ever decide to share this function, we could want to allow the user to decide how the formatting is taking place:

def iformat(elements, line_format="- {element}", line_end="\n"):
    output = []
    for element in elements:
        output.append(line_format.format(element=element))
    return line_end.join(output)

Again, the implementation just increased in complexity, now you have to chase each variables to understand what exactly everything do.

For the price of this, we allowed the function to be useful in a broader variety of situations.

This doesn't change anything to our original use case, thanks to Python parameter default values:

>>> print(iformat([1, 2, 3]))
- 1
- 2
- 3

But you can change what a line look like, or even the whole concept of listing:

>>> print(iformat([1, 2, 3], "* {element}"))
* 1
* 2
* 3
>>> print(iformat([1, 2, 3], "{element}", ", "))
1, 2, 3

You don't know what you don't know

This is now a pretty versatile function, but how much versatility do you need?

It's a very hard question to answer, and frankly, there is no perfect one.

One thing we can do, though, is let the end user decide, by using dependency injection. This means making some of the function code come from outside.

For this simple function, we can use callbacks, once for formatting the line, one for formatting the whole output:

def default_element_formatter(element, line_format):
    return line_format.format(element=element)

def default_output_formatter(output, line_end):
    return line_end.join(output)


def iformat(
    elements,
    line_format="- {element}",
    line_end="\n",
    format_element=default_element_formatter,
    format_output=default_output_formatter
):
    output = []
    for element in elements:
        output.append(format_element(element, line_format))
    return format_output(output, line_end)

Well, clearly the function is not simple anymore. Not only somebody looking at the code must be able to understand function references and their uses in the context of a callback, but the person must now look at two more functions to understand the implementation.

Keeping up with the trend, for this price, not only the use cases we desired are still preserved:

>>> print(iformat([1, 2, 3]))
- 1
- 2
- 3
>>> print(iformat([1, 2, 3], "{element}", ", "))
1, 2, 3

But now if some user comes up with something we didn't think about, they may be able to plug it in. E.G., let’s format every line with the number of letters in there:

>>> def format_with_size(element, line_format):
...     return line_format.format(element=element) + f" ({len(element)})"
...
...
>>> fruits = ['banana', 'peer', 'apple']
>>> print(iformat(fruits, format_element=format_with_size))
- banana (6)
- peer (4)
- apple (5)

At this stage, you could argue this can be achieved by transforming the data before it gets into the function. But we don't have to dictate that, we just provide the hook, the user will decide if it's worth to use it or not.

Documentation

Now that this baby is a swiss-army callable, we can add some docstrings to explain how it's used:

def default_element_formatter(element, line_format):
    """Format a element for iformat() with the default behavior

    :param element: any object that is the main element of the current line
    :param line_format: the string to use to format the main element.
        It should be compatible with str.format() syntax.

    :returns: the string for a single line with this element
    """
    return line_format.format(element=element)

def default_output_formatter(output, line_end):
    """Format all lines for iformat() with the default behavior

    :param output: the collections of lines iformat() gets from format_element.
    :param line_format: the string to use to separate each line.

    :returns: the ouptut string of iformat()
    """
    return line_end.join(output)


def iformat(
    elements,
    line_format="- {element}",
    line_end="\n",
    format_element=default_element_formatter,
    format_output=default_output_formatter
):
    """Turn an iterable of elements into a string ready to be printed

    :param elements: an iterable of objects to format
    :param line_format: the str.format() compatible string to use to format
        one element. Use "{element}" to get access to the object to format.
    :param line_end: a string to separate elements from each others
    :format_element: callback to pass if you wish to have full control over
        the formatting for individual elements. See default_element_formatter()
        for the implementation.
    :format_output: callback to pass if you wish to have full control over
        the formatting for the collections of elements. See default_output_formatter()
        for the implementation.
    """
    output = []
    for element in elements:
        output.append(format_element(element, line_format))
    return format_output(output, line_end)

This will not affect the way the code runs in any way, but now a user can get help on each function:

>>> help(iformat)
iformat(elements, line_format='- {element}', line_end='\n', format_element=<function default_element_formatter at 0x7f5345807d30>, format_output=<function default_output_formatter at 0x7f53457b71f0>)
    Turn an iterable of elements into a string ready to be printed

    :param elements: an iterable of objects to format
    :param line_format: the str.format() compatible string to use to format
        one element.
    :param line_end: a string to separate elements from each others
    :format_element: callback to pass if you wish to have full control over
        the formatting for individual elements. See default_element_formatter()
        for the implementation.
    :format_output: callback to pass if you wish to have full control over
        the formatting for the collections of elements. See default_output_formatter()
        for the implementation.

As a bonus, tooling can now display it: your editor can show a help tooltip, the documentation builder can create a website with the function docs, etc.

Moar documentation

Our effort to document it can go even further by annotating the parameters using type hints:


from typing import Iterable, Callable

def default_element_formatter(element: Iterable, line_format: str) -> str:
    """Format a element for iformat() with the default behavior

    :param element: any object that is the main element of the current line
    :param line_format: the string to use to format the main element.
        It should be compatible with str.format() syntax.

    :returns: the string for a single line with this element
    """
    return line_format.format(element=element)


def default_output_formatter(output: Iterable[str], line_end: str) -> str:
    """Format all lines for iformat() with the default behavior

    :param output: the collections of lines iformat() gets from format_element.
    :param line_format: the string to use to separate each line.

    :returns: the ouptut string of iformat()
    """
    return line_end.join(output)


def iformat(
    elements: Iterable,
    line_format: str="- {element}",
    line_end: str="\n",
    format_element: Callable[[Any, str], str]=default_element_formatter,
    format_output: Callable[[Iterable[str], str], str]=default_output_formatter
) -> str:
    """Turn an iterable of elements into a string ready to be printed

    :param elements: an iterable of objects to format
    :param line_format: the str.format() compatible string to use to format
        one element. Use "{element}" to get access to the object to format.
    :param line_end: a string to separate elements from each others
    :format_element: callback to pass if you wish to have full control over
        the formatting for individual elements. See default_element_formatter()
        for the implementation.
    :format_output: callback to pass if you wish to have full control over
        the formatting for the collections of elements. See default_output_formatter()
        for the implementation.
    """
    output = []
    for element in elements:
        output.append(format_element(element, line_format))
    return format_output(output, line_end)

Again this will change little to Python's behavior, but now somebody looking at the function, or tooling, can use the information given by the types to make decisions. A developer can use "mypy" to check he is passing the proper arguments to it.

This is now quite a lengthy piece of code, and we left scripting for industrialization. The codes brings a lot on the table, but also assumes a lot: a maintainer must have more knowledge and resources at her disposal than with previous examples to invest in it.

The best version of all

So we took some extremely basic version of this algo, and turned it into a much richer one, at the cost of complexity.

So what's the best version of them?

Well, none of course.

The reason we can code each of those versions at all is because they are all very useful.

Each of them offer a compromise between cost and features.

You may want the very first simple one, because that's all you need, and you don't want to spend more time on such a trivial problem.

You may want the full featured last one because it will be included in a library you are going to ship and want to provide as much utility with it as you can.

Or anywhere in between.

Programming is ultimatly engineering, which means it's ROI decision after ROI decision.

Python strenght is that it allows for a very large spectrum to choose from.

Tom

May 23, 2023Edited

> Now that this baby is a swiss-army callable

Lol

I have always wondered how to do dependency injection when interfacing with external APIs so that I can use fake data for testing and then use the real endpoints by default when running live. Using default callables but then calling with fakes in testing seems like a great way to do that so thank you.

I have been programming in Python for 5+ years and async is still really confusing to me, not to mention greenlets/Gevent, trio, an Twisted. I know that under the hood it is somehow just syntactic sugar over a generator. Could you do a post on this?

Expand full comment

2 replies by Bite Code! and others

2 more comments...

Bite code!

Discussion about this post