Testing with Python (part 8): purity test

It has a nice ring to it

Jul 18, 2024

Summary

(Or go to previous article of the series)

Should you be pragmatic and use integrated tests:

def test_add_new_item_to_cart(product, cart):
    new_product = Product.objects.create(
        name='New Product', 
        price=15.00
    )
    new_cart_item = add_item_to_cart(cart, new_product, 1)
    assert new_cart_item.cart == cart
    assert new_cart_item.product == new_product
    assert new_cart_item.quantity == 1

Or strive for isolation and mock your way out of combinatorial explosions:

def test_add_to_cart_endpoint():
    product_mock = mock()
    cart_mock = mock()
    cart_item_mock = mock(quantity=1)
    user_mock = mock(cart=cart_mock)
    request_mock = mock(user=user_mock)

    (when(get_object_or_404).called_with(Product, id=1)
                            .thenReturn(product_mock))
    (when(add_item_to_cart).called_with(cart_mock, product_mock, 1)                    
                           .thenReturn(cart_item_mock))
    response = add_to_cart(request_mock, CartSchema(id=1, quantity=1))

    assert response == {"id": 1, "quantity": 1}

Both will work.

The first one is easy to write and read but will come with a lot of baggage that will feel heavy on the long run. If they have a long run. After all, many tests are regularly discarded, so why spend too much time on them?

The second one can build a pyramid of trust that assumes any dependency is tested and working. If you start by testing the foundation, then by induction, everything can then be assumed to work. And each test is clean, independent, fast.

Or are they? Are you really sure you are testing the feature, and not being deceived by your mocks always saying yes? After all, those tests are harder to read and write.

So team blue, or team red?

I'm a Versus with Turtle rising

Despite its reputation, IT is kind of a spiritual field, with strong opinions held as beliefs, applied dogma, cult followings, and charismatic leaders.

This comes with the usual red vs blue sports team oppositions, like Mac VS Windows, Python VS Ruby, and VIM vs Emacs, even if the latter was already obsolete 20 years ago and probably means absolutely nothing to half of you, dear readers.

Well, it's turtle all the way down, and you got those types of oppositions at every level. People will debate tabs vs spaces, methods vs functions, and whether you should or not use an early return this time because it's a try/except and maybe in this case...

So of course testing, which is held in great regard by the priests of the technically correct, comes with those lovely dualities and none is more important than:

Isolated VS Integrated

However, despite this looking like a pointless debate, it has, in fact, important consequences on what your tests end up being.

You know where I'm going, it's spectrum time!

Spectrum where are you? Err, I mean, where are you on the spectrum? Wait, no...

Phrasing!

You'll notice all around the web that people call "integration tests" "integrated tests", and vice versa, which makes the whole thing blurry. I'm guilty of this.

When writing this article, I had to backtrack to see how consistent I was with "integration" vs "integrated". I was not, so I fixed all the series before writing this one.

To make sure we are on the same page:

Integrated test: a test on non-isolated code.
Integration test: testing that parts fit together. Integration tests are integrated tests, but not the other way around.
e2e tests: testing the whole span of a program. e2e tests are the heaviest form of integration tests.
Continuous integration: every time you commit, you package and then install the whole system and run all the tests.

Wait, what do I mean by "isolated"?

The usual definition of a "unit test" is a test that checks a small part of the code in isolation. So, with this definition, a unit test is the opposite of an integrated test. This is the definition I used a few posts ago, even quoting Wikipedia on this.

BUT!

There is a school of thought that will tell you that we got that all wrong (like in this excellent Ian Cooper's conf), that "unit" applies to the test itself, not the code being tested.

For them, the "unit" qualifier is meant to state that each test is independent from other tests. But the code being tested doesn't have to be independent, and so an integrated test can be a unit test.

Gotta love the ability of our community to make things as unclear as possible.

So, while I had to use the words "unit test" for most of the series because you would have looked at me weird, starting from now I'm going to switch to a more specific terminology:

"isolated tests" for when you test a part of the code in isolation, and so you create mocks for everything it depends on.
"integrated tests" for tests that don't care about isolation.

Whether those are "unit" tests or not, I'll let you choose.

Integrated test for an API endpoint

Let's say we have an HTTP API endpoint using the nice django-ninja lib, to add a product to a cart on a shopping website:

# WARNING: Since the project doesn't actually exist, I write 
# this code and the associated tests without running it, so read 
# it to get the general idea, don't take it as exactly what 
# you should write

import logging

from django.shortcuts import get_object_or_404
from django.db import transaction
from .models import CartSchema, CartItem, Product
from .endpoints import api

log = logging.getLogger(__name__)

@api.post("/cart")
def add_to_cart(request, payload: CartSchema):

    product = get_object_or_404(Product, id=payload.product_id)

    # Add an item to the cart if it doesn't exist, or update 
    # the quantity if it does. We do that with locked rows in 
    # a transaction to deal with the potential race condition.
    with transaction.atomic():
        locked_qs = CartItem.objects.select_for_update()
        cart_item = locked_qs.filter(
            cart=request.user.cart, 
             product=product
        ).first()

        if cart_item:
            cart_item.quantity += payload.quantity
            cart_item.save()
        else:
            cart_item = CartItem.objects.create(
                cart=request.user.cart,
                product=product,
                quantity=payload.quantity
            )

    log.info(
        'Product %s added to user %s cart',
        product.id, 
        request.user.id
    )

    return {"id": product_id, "quantity": cart_item.quantity}

Integrated tests for that will be using the Django test client. Since it will hit the test database, we will need fixtures:

import pytest
from django.contrib.auth.models import User
from ninja.testing import TestClient
from .models import Product, CartItem, Cart
from .endpoints import api

@pytest.fixture
def user(db): # db fixture assumed from pytest-django
    user = User.objects.create_user(
        username='testuser', 
        password='testpass'
    )
    Cart.objects.create(user=user)
    return user

@pytest.fixture
def client(user):
    return TestClient(api)

@pytest.fixture
def auth_client(client, user):
    client.authenticate(user)
    return client

@pytest.fixture
def product(db):
    return Product.objects.create(name='Test Product', price=10.0)

@pytest.fixture
def cart_item(db, user, product):
    return CartItem.objects.create(
        cart=user.cart, 
        product=product, 
        quantity=1
    )

Then we need to test that:

Auth is required.
A non-existing product ID returns a 404.
A non-existing item cart is created.
Any existing item cart quantity is incremented.
The API return format matches our expectations.

def test_user_not_authenticated(product, client):
    payload = {'product_id': product.id, 'quantity': 1}
    response = client.post('/cart', json=payload)
    assert response.status_code == 401

def test_non_existing_product_raises_404(auth_client):
    payload = {'product_id': 9999, 'quantity': 1}
    response = auth_client.post('/cart', json=payload)
    assert response.status_code == 404

def test_cart_item_quantity_incremented(auth_client, product, cart_item):
    payload = {'product_id': product.id, 'quantity': 1}
    response = auth_client.post('/cart', json=payload)
    assert response.status_code == 200
    cart_item.refresh_from_db()
    assert cart_item.quantity == 2

def test_cart_item_created_if_not_exist(auth_client, product):
    payload = {'product_id': product.id, 'quantity': 1}
    response = auth_client.post('/cart', json=payload)
    cart_item = CartItem.objects.get(cart=user.cart, product=product)
    assert response.status_code == 200
    assert cart_item.quantity == 1
    assert response.json() == {"id": product.id, "quantity": 1}

# You may want to write more tests, like if your schema rejects 
# properly badly formatted requests, or to check what happens with 
# congestions on locked rows but this is enough to make our point.

The logic is straightforward: you use the API as it's intended. If you know HTTP, django-ninja, and pytest, the code is actually self-describing. I swear.

It doesn't require much design decision, and in fact, ChatGPT is very good at writing such tests, so you can cut on the time it takes to write the boilerplate.

It also tests that all those things work together and that the routing, auth and rest of the machinery are behaving like you think it does.

On the other hand:

It's slow since it has to deal with a test DB.
Each test depends on the whole thing. If one piece breaks, the whole test suit explodes.
Changing the code means changing all those tests, all parts of the system depend on each other.
What you are testing is not obvious at first glance, because of all the noise.

I like those tests because they are cheap and relatively easy to implement. Tests are often disposable, especially at the early stage of a project when the code base moves a lot.

So having stuff that cost little to make, and that you don't think twice about discarding, has great value. Plus, everybody can get involved in this test suite, even the intern.

But what if we don't want disposable tests? What if we want a robust, fast test suite that will demonstrate the whole system is sane now, and for years to come?

Isolated tests can help with that.

Same endpoint, but with isolated tests

Isolated tests drive design because to create good isolation, you need clear, distinct parts that you can, in fact, isolate. In our case, if we pause for a second, we can see that the cart item creation could be moved into a separate function:

def add_item_to_cart(cart, product, quantity):
    """Create a cart item, or update the quantity an existing one.

    We do that with locked rows in a transaction to deal with 
    the potential race condition.
    """
    with transaction.atomic():
        locked_qs = CartItem.objects.select_for_update()
        cart_item = locked_qs.filter(cart=cart, product=product).first()

        if cart_item:
            cart_item.quantity += quantity
            cart_item.save()
            return cart_item

        return CartItem.objects.create(
                cart=request.user.cart,
                product=product,
                quantity=quantity
            )


@api.post("/cart")
def add_to_cart(request, payload: CartSchema):
    product = get_object_or_404(Product, id=product_id)
    cart_item = add_item_to_cart(request.user.cart, product, quantity)
    log.info(
        'Product %s added to user %s cart', 
         product_id, 
         request.user.id
    )
    return {"id": product_id, "quantity": cart_item.quantity}

This will make add_item_to_cart something we can mock, and isolate the database side effects. It also has the nice property of increasing our separation of concern, and each part of the code is clearer. We could have done that when writing integrated tests, but integrated tests didn't drive us to do so because there was no need for it.

Given this new design, we can decide on how many integrated tests we want.

If you are a purist, you could decide to mock the whole CartItem ORM and check that you use it appropriately. I think this would take things too far. First, it would be a very ugly test, second, you would still need to test it with a real DB one day anyway, since reality will catch up to you in the form of unknown edge cases if you don't.

So we will still write integrated tests for add_item_to_cart, but we will need fewer fixtures:

@pytest.fixture
def cart(db):
    user = User.objects.create_user(
        username='testuser', 
        password='testpass'
    )
    return Cart.objects.create(user=user)

@pytest.fixture
def product(db):
    return Product.objects.create(name='Test Product', price=10.00)

def test_add_existing_item_to_cart(cart, product, cart_item):
    cart_item = CartItem.objects.create(
        cart=cart, 
        product=product, 
        quantity=1
    )
    updated_cart_item = add_item_to_cart(cart, product, 2)
    cart_item.refresh_from_db()
    assert cart_item.quantity == 3
    assert updated_cart_item == cart_item

def test_add_new_item_to_cart(product, cart):
    new_product = Product.objects.create(
        name='New Product', 
        price=15.00
    )
    new_cart_item = add_item_to_cart(cart, new_product, 1)
    assert new_cart_item.cart == cart
    assert new_cart_item.product == new_product
    assert new_cart_item.quantity == 1

Because we don't have the fluff of the API around this test, we have fewer levels of indirection. In fact, the tests are already of higher quality than the integrated ones, because we test that the DB records are correct, not that the API returns what we expect. This helps with The Enterprise Developer from Hell problem.

And we can even naturally test more things, like if we have been added to the proper cart. Again, we could have done that with our previous test suite, but it didn't encourage us to do it.

Then we move to testing the endpoint. And here is where it gets interesting. Because we test it in isolation, we don't need to test the cartesian product of all possibilities, we can just check that it calls the right things with the right params:

import pytest
from mockito import when, mock
from django.shortcuts import get_object_or_404
from .views import add_to_cart
from .models import Product, CartItem, User
from .models import CartSchema, CartItem, Product

def test_add_to_cart_endpoint():
    product_mock = mock()
    cart_mock = mock()
    cart_item_mock = mock(quantity=1)
    user_mock = mock(cart=cart_mock)
    request_mock = mock(user=user_mock)

    (when(get_object_or_404).called_with(Product, id=1)
                            .thenReturn(product_mock))
    (when(add_item_to_cart).called_with(cart_mock, product_mock, 1)                    
                           .thenReturn(cart_item_mock))
    response = add_to_cart(request_mock, CartSchema(id=1, quantity=1))

    assert response == {"id": 1, "quantity": 1}

Indeed, we don't need to test that the endpoint returns a 404 if the product doesn't exist. get_object_or_404 is already tested by the Django test suite, so we can assume it works. We can just check that we use it.

In the same way, since we already tested add_item_to_cart, we don't need to check that an item has been added correctly. We can assume it works, and just check that we use it.

This means testing the endpoint is a single test. It's also independent of the details of cart management, routing, authentication, or request/response serialization.

I made the choice to not test the logging call, but you could mock it and test it as well, or just create a handler to capture the content.

Because django-ninja allows you to dump the schema of the API, we can also improve our resilience by checking the entire API signature at once, independently:

import json
import pytest
from django.urls import reverse
from ninja.testing import TestClient
from .endpoints import api
from pathlib import Path

client = TestClient(api)
schema_file = Path('api_schema.json')

# This is naive and will fail even if you just add an enpoint, 
# but you get the idea
def test_api_schema(get_api_schema):
    if not schema_file.exists():
        # This command doesn't really exist, but you can code it.
        pytest.fail(f"API schema missing. Run ./manage.py dump_schema.")

    current_schema = client.get(reverse("api:schema")).json()
    saved_schema = json.loads(schema_file.read_text())
    if current_schema != saved_schema:
        pytest.fail("API schema has changed. Follow stability policy.")

Now, if you change the routing or the parameters of your endpoint, no need to update every single test. You can focus on updating the client, changing the API versions, telling users about it, updating the doc, and upgrading your e2e tests (those should exist independently of whether you use isolated or integrated tests for most of your code base).

So blue or red?

A spectrum is a spectrum, and you can of course put the cursor wherever you like. I wrote the article pretending there are two ways of doing it, but we all know we can go to more extremes, or in the middle, mixing both.

I know I know, I'm weasling out of a definitive conclusion with "it depends". Welcome to real life, where things suck and absolute truth doesn't exist. I'm waiting for the time when we discover the laws of physics are not homogenous in the entire universe. That will be fun.

So:

Here are the integrated tests example, in one file.

And here are the isolated tests example.

Which one do you prefer?

The integrated tests are currently shorter, but they won't stay that way. As soon as we add more features, the graph of possibilities will grow and the number of tests will either explode or ignore edge cases.

But they are much easier to read because they are a collection of basic concepts.

On the other hand, the isolated tests are more correct: they are actually testing the business logic directly, and not just what the API returns. They can also already warn us when we break our public API in any way, and forever. Since only a fraction touches the DB, they are going to be fast, meaning we have a quick feedback loop.

But the mock magic makes you stop. Do you really know what happens in there? Is it saving your butt, or are you testing your mocks instead of your code base?

Plus, isolated tests are harder to write, require more experience, and more abstract thinking. They are less code in the long run but will take longer to bootstrap. You can't delegate them to ChatGPT (or an intern), it will get them completely wrong.

Then again, by nature, you won't get the combinatorial explosions of all the possible things to test mixing with each other. Your code base will get exponentially more robust and more tested. You won't lie to yourself about what correctness you cover. As much.

This builds trust incrementally since each test can assume the underlying functions are tested, recursively until you reach the ones actually performing the side effects. You know the ground you cover, and you can rely on this coverage progression.

However, and I'm going to repeat myself, I rarely go full isolated testing.

Most projects are just not that critical. For the typical app, change rates are going to be so fast that having dirty but disposable tests is more important than proving the system is very reliable. Bugs in production will happen, and the boss is likely ok with it although he won't admit it.

Or rather, his bank account is.

Because the hard truth is very reliable software is super expensive. It takes time, it takes skill, it takes care. Plus, the typical users are quite tolerant of bugs but want features, yet probably don't want to pay much for them.

If you ever wondered why software is so buggy in general, and the advertising industry so powerful, there it is.

Since this is a zero-sum game, spending resources on one thing means you don't spend it on another. Design is expensive. Ergonomics testing is expensive, marketing is expensive.

If you ever wondered why a super robust FOSS app is very ugly or hard to use, there it is.

On top of that, unlike in many industries, the consequences of user-facing issues are not as high as in the typical nerd mind. Most companies pay little price for a security breach or messages delivered to the wrong user. But don't tweet the wrong thing, you could lose your head!

If you ever wondered why a bug has not been addressed for years, but the icon pack has been updated 3 times to reflect the current trend, there it is.

Anyway, a few projects reach a huge user count and have very few bugs at the same time.

Even fewer make the isolation distinction.

Let's look at FastAPI's codebase, for example, the file test_datastructures.py, and find an isolated test:

def test_default_placeholder_equals():
    placeholder_1 = Default("a")
    placeholder_2 = Default("a")
    assert placeholder_1 == placeholder_2
    assert placeholder_1.value == placeholder_2.value

What about an integrated test? This one has clear side effects:

def test_upload_file_is_closed(tmp_path: Path):
    path = tmp_path / "test.txt"
    path.write_bytes(b"<file content>")
    app = FastAPI()

    testing_file_store: List[UploadFile] = []

    @app.post("/uploadfile/")
    def create_upload_file(file: UploadFile):
        testing_file_store.append(file)
        return {"filename": file.filename}

    client = TestClient(app)
    with path.open("rb") as file:
        response = client.post("/uploadfile/", files={"file": file})
    assert response.status_code == 200, response.text
    assert response.json() == {"filename": "test.txt"}

    assert testing_file_store
    assert testing_file_store[0].file.closed

Some are a bit of a mixed bag, like:

@pytest.mark.anyio
async def test_upload_file():
    stream = io.BytesIO(b"data")
    file = UploadFile(filename="file", file=stream, size=4)
    assert await file.read() == b"data"
    assert file.size == 4
    await file.write(b" and more data!")
    assert await file.read() == b""
    assert file.size == 19
    await file.seek(0)
    assert await file.read() == b"data and more data!"
    await file.close()

It doesn't have a file system side effect, since the file is in-memory. But it does exercise a whole event loop, which is a very big piece of software that is not part of the VM, unlike in JS. You may argue that it's not self-contained enough to be an isolated test, nor fast enough.

This is where pedantry gets in the way of getting things done.

Does FastAPI care?

No, it doesn't even divide the tests into categories, everything is in a big "tests" directory.

The real world is messy, and isolating tests IRL is harder than in a tutorial. After all, did you notice our example of isolated tests doesn't check if we are authenticated? I skipped it because I had no ideal place for it. e2e? With a big for loop on the router? On an OpenAPI custom extension? It's possible of course, it's just... not obvious where it should go.

Nevertheless, you might work on your own baby, or a project in an industry that is very risk-adverse, or quality-oriented. If this is the case, and you have the time and resources, you may choose to move the isolation ratio up and enjoy the blissful peace of mind of running an elegant test suite.

After the frustration of the battle implied with writing them, that is.

Ash Roberts

Jul 18

I've been falling somewhere along the line of mocking the external dependencies, and using the real versions of internal ones. The way I'm seeing it, is if method X depends on method Y, and method X passes, if method Y is failing, method X is not the reason why.

Who knows, maybe in 6 months I'll be changing my tune. That seems to happen a lot. I learned some principle and argue with jet gbt about how ridiculous it seems, because I'm not yet writing good enough code to take advantage of the concept, And then down the line I'm refactoring with better skills, And suddenly the thing that I was railing against before makes my code a lot better 😁 It's almost like decades of software engineers might know a bit more than somebody who picked it up last year 😂

As the "intern" (and also the senior developer lol), My test writing process is to ask chat GPT, or usually Claude, to write these tests And then watching them all fail 😅

Then I have to work out whether it's the test that is broken or the method. 90% of the time, it's the test.

Expand full comment

Bite code!

Discussion about this post