Testing with Python (part 4): why and what to test?
Delaying the chores by second-guessing all your decisions
Summary
In the previous article, I promised we would move from purely learning the mechanics to questions everybody ask themself when learning testing.
To define what to test, you need to first start with why you are testing. They are a few main purposes that most people agree on, and you need to know where your needle is for each spectrum:
Purpose 1: to avoid regressions
Purpose 2: to manage quality
Purpose 3: to match the specs
Purpose 4: to dilute responsibility
Purpose 5: to reassure you
Purpose 6: to learn testing
Purpose 7: to check a box
Once you know where you stand, you need to draw the border of your territory, and list the constraints in which you will try to reach those goals: what resources are at your disposal, how much time can you dedicate to testing, what is the budget and how many hoops do you need to go through to get work done in your environment?
Then, and only then, you can start deciding what part of your code to test and how much of it.
When in doubt, remember that the closer you get from the public part of your system (e.g.: the UI), the more of your system you exercise, and the more you test at once. It's a great short-term gain, but those are also the tests that will break the most easily and bring the least quality to the code. It's Pareto though. And you know how much I love Pareto.
Why are you testing?
To decide what to test, how much and in which order, you need to figure out the reasons you are testing in the first place.
Testing is a lot of work, it takes times, efforts, resources. To justify it, it must serve a purpose, and you have to define what it is, for each project. This doesn’t need to be an agonizing decision and with practice, you can usually evaluate that on the spot.
Purpose 1: to avoid regressions
This is the most common purpose of testing: ensuring that when someone modifies the code, we don't break what was already working. It's also the most Pareto and realistic purpose, because it doesn't need 100% of all the edge cases to be covered to already give you good returns. A few good test on the most used paths of your code will ensure most of your users will not get mad, most of the time.
Most people focus on unit tests for this but experience taught me that a few good end-to-end ones (we will define that later) can take you very far.
It's my number one reason for testing.
Purpose 2: to manage quality
Adding tests will likely increase the quality of your software, providing you don't overdo it and actually ship regularly. :)
It's a nice side effect of it, but you can also deliberately chose to leverage it for that purpose:
it will reduce the bugs in your code base by forcing you to check your assumptions, put edge cases in front of you and exercise the code regularly.
it will force you to think about your API, and make it testable, therefore less coupled.
it will make you reconsider the scope of your project, because testing is costly, so you will have to think, "is this feature worth it?".
Unfortunately, this begs the questions: what is quality? What level of quality do you target? How do you measure it?
It's a rabbit hole, and companies rarely have a list of objective, testable quality metrics. When they do, those metrics tend to end up being gamed.
Unless you are NASA, go with the flow on this one, especially if you have a customers on the other side that will influence it along the way. Experience helps a lot, so don't be discourage if you suck at it at first.
Deciding to have a low-level quality check is a perfectly fine decision to make. But you have to make it consciously.
Purpose 3: to match the specs
Tests are a practical, readable and always up to date demonstration on how your system work. Therefore they are great to prove you follow specs or standards. If this is your goal, you need to decide right now, because this requires discipline, and the whole team on board. In a way, it's easier than "manage quality", because whether you match it or not is more objective (although IRL practice has a way to play tricks with that one too). But it can be a lot of work, it's tedious, repetitive, details oriented and requires a lot of documentation.
It has a different, less costly, and kinda wonderful consequence: it's documentation and reference on how your code works. It helps beginners to understand how your code base ticks, makes on-boarding easier and serves as a reference, for you and your AI plugins.
Even if you don't match a spec, I recommend to always have at least this documentation objective in mind, because it quickly pays dividends. This means, covering the most common use cases, using good names for your tests and exercising your public API. I even make some tests a bit too big, so that they read like an executable and guaranteed to be correct tutorial.
Purpose 4: to dilute responsibility
If you are working in the corporate world, you will have to ensure two things:
ensure legal compliance;
avoid being blamed.
This is a purpose nobody speaks about online, because it's a dirty topic, but it's the reality of millions of workers.
If this is your goal, you'll have to check with the legal department what you need to comply with, and dedicate a test section with copious matching comments explaining what the code protects against. E.G.: if you can demonstrate that user data can be exported for GPRD reasons.
Sometimes, it's not about legal goals, but about encoding decisions, so that if something goes wrong, you can justify it. E.G.: your client said "don't check for this", then you get a confirmation by email, and you write a test with a comment stating that you don't check for this on purpose.
All that is very contextual, and perfectly optional. Add it on your plate depending on your environment.
Purpose 5: to reassure you
When a system becomes complex, and all growing system eventually reach that stage, tests will be your lifeline.
If a project is critical, if people's lives can be affected, if your career is on the line, then seen thousands of tests passing with a bright green check is going to have a very positive effect on your blood pressure.
This requires heavy investment though, so if you need this, you will need to test a lot: prepare the time and resources for that.
Purpose 6: to learn testing
It is absolutely a valid goal.
You need to learn the stuff, not just the syntax, but the process. There is no way around it, practice is key.
For this, you need to try a lot of different types of testing, but you don't need to test everything.
This is usually the opposite of "managing quality", so don't do it on a critical project, or one where you are a deadline.
Your first experience with tests is unlikely to produce great results. Use something where either you are allowed to screw up, or you don't care about the consequences of screwing up.
I let you decide the full implications on this.
Purpose 7: to check a box
Life can be pretty stupid sometimes, and yes, like Scott Jurek says in the excellent, "Eat and run":
Sometimes, you just do things.
Question the source of the constraint so you know what level of compliance you need, and this will tell you how much work to invest.
Again, the key is to make it explicit. Don't let it happen to you, decide.
Because at some point, you will be responsible for the time it takes and the resources consumed.
E.G.: your team leader is adamant that the color of the header needs to be tested. Your client says they want a "full comprehensive test suite" for their screen shot utility. Figure out what they mean by that, suggest what it would take: the nature of the solution, how they will validate the success of it, and its costs. Then get their approval. You now have your box to check.
Price accordingly. Do not underprice. In fact, overestimate your gut feeling, then overprice that. I don’t mean just money.
This tends to make people clearer, and more reasonable.
What are your constraints ?
Now you know what you want to do, but can you do it? We don't live in a dream universe, laws of physics apply.
If you have one person dedicated to define specs, you will have a very different experience than if you have to extract them yourself from the brain of whatever mastermind is your contact, client or user. If you have a few days to hack on the code, or 5 years, the amount of testing you can bring on will be very different.
You can test the happy path and a few failed cases manually, or you can write a complete fuzzy testing checking thousands of stuff randomly, generate reports and send alerts when something goes wrong.
But you need somebody to do all that. Is that you? Someone else? How much can this person dedicate to it?
On top of that, you need to know the skills of the persons at your disposal.
If you have (or are) a junior dev, chances are you will take more time, make many mistakes and have to second-guess a lot of things. Also, if some tests are critical, because it can cost money, kill people or you risk getting sued in case of failure, you need a different skill level than if testing a contact form of your grandma's knitting blog.
That's why defining the purpose of testing was important in the first place.
Also, you have to know how much time and budget you have to do the whole thing. Once again: testing takes time. The more you test, the more time it takes. People will tell you it saves time on the long run. It can be true, but not always, it depends on the stakes, and the rate at which things change so you have to update the tests. And even when it does, it will slow you down at the beginning, and only gets you the ROI later. You pay the entry cost right now. How much time testing will take is super hard to evaluate, even with experience, so give yourself plenty of margin of maneuvers, state it loudly to the persons you report too, and rethink it often.
You might hear the argument that untested software is broken software. Indeed. However the world runs on broken software because there are a lot of things that can be valued more that the absence of edge case bugs. You get to decide, yeah!
It will take you a lot of efforts when going it for the first time, but the more you do, the more automatic it becomes. I never think of this list myself, I just ask these questions in my head out of habit. So yes, it looks like a lot of boring work, but it's not how people practice it. Practice, once again, will make it flow.
Finally, you need to know what environment you have to test in. This will affect what type of tests you can do and how productive you will be. By that I mean things like:
are you forced to use a lock down laptop?
can you choose the libs and tooling to test?
can you install them yourself or do you need to request them?
do you have access to a Continuous Integration system (e.g.: github actions)?
If you have to interact a lot of with an IT service, how easy it is?
Etc.
This seems like nothing and is completely overlooked by most discussions on testing, but it will affect what you can do, your velocity and your moral. Don't try to test more than your environment allow you to comfortably, it's a recipe for failure.
You can test much more, and more easily with a competent team, and excellent tooling and the freedom to take the initiative on any issues. It's crazy how friction can slow you down, so factor it in the time and price you estimate.
How much to test ?
Should you test everything? And what does "everything" mean?
When is something "tested enough?"
To me the golden standard for testing is the SQLite code base. They have 590 times more test than code lines where they check for things like what happens when power is lost!
And even them are not done.
Their project history contains the following commit message from two days ago (2024-05-04):
Add test cases to test/in7.test. No code changes.
There is no such thing as completely tested software. It. Doesn't. Exist.
So you do have to make a choice about what to test, and again, make that decision explicit. This is where I start, this is where I stop.
So you take all your stated purposes and listed constraints, then start carving out the scope of how much you want to test.
Not testing is fine.
Making only 3 tests is fine.
Testing only the API is fine.
Targeting 100% of code coverage is fine.
As long as can explain why it is that way.
Do you think I write tests for my scripts or even the snippets for this blog? Hell no, I'm perfectly OK with it failing. The cost of it is acceptable to me.
I worked on a website service that had half a million users, and a money maker. It had zero test. Sometimes the site was down for hours. Users are a bit pissed off, Google de-index some pages, the owner loses money. But he has decided to accept that has the cost of being able to work with a very small team on the website, and only part-time.
The cost of adding test would be too much for him. He is not a great programmer, abstraction is not his forte, and he doesn't want to invest in it. He'd rather spend time adding features. Basically, the users test the software for him in production, and he focuses on fixing what are reported to him via email or sentry. He moves fast.
On the opposite of the spectrum, I worked last year for a client that has had 500 millions of euros going through the software every year, in a tremendously regulated industry. In that case we had full specs, and tests that were referencing said specs, each testing many different cases. We fed the software with 20 years of historical data and existing results. We had two external audits. Extensive manual testing on top of mandatory code reviews. Full linters, type checkers and automatic releases with a CI. We moved excruciatingly slowly but we found bugs… in our providers implementations!
A few rules of thumb
If you have a product, end to end testing by automating the GUI is brittle, but also brings the most bang for your bucks. When you don't have a lot of time for testing, you can start with that. Same with a public Web API, a cmd line interface, etc. Bonus: when you are a beginner, this type of testing is easier to conceptualize as it's very concrete, and you don't have to work around side effects since you are testing the side effects. And it's totally OK to do it quick and dirty first. You won't get the benefits of better code and architecture with e2e testing though.
And yes, the series will show an example on how to test a script, how to test a UI, etc. I’m starting to wonder If I have a change to finish it before the end of the year.
If you have a library, test the public API in priority if you can. Top to bottom exercise the most of your stack as early as possible while preventing you from losing yourself in the details. It will also force you to think about the boundaries of your program and the service it delivers, something geeks have a hard time to do. It's so much more beautiful to unit test a clean single perfect little function. This is not always possible because if you start a system from scratch, you may need to create a lot of low-level tooling first.
If reliability is of the essence, and you have a spec, then test the components first, as specified, and name/comment/reference each test according to the part of the spec it matches. Use type hints, especially for models and units. If you have to support several versions of a spec, version and namespace the tests too. The highest the quality requirement, the more you need to complement unit tests with acceptance testing, back testing, fuzzing, and property testing.
Purity is often a terrible way of deciding on priorities. 100% test coverage is rarely a good goal early in a project, don't focus on that too much, as the last percent are expensive to get and bring the least benefits. The 500 million euro project was capped at 95%. It's also OK to mix concepts and be imperfect. You may have some side effects in your unit tests. You may exercise only some branches and some happy path. You may not test for some errors. You may have tests that are between e2e and unit testing, but not exactly either. Don't let the best be the enemy of the good. Stay on track with your initial stated purpose, and remember your constraints. That why we started with them. They are your guideline.
Of course, at this stage, you may not know what are coverage, property testing or fuzzing.
So this will be explained in the next articles of the series. In fact, the next one is about the different types of testing.
The "different types of testing" hyperlink seems to point at this article instead of Part 5.
There's also Purpose 0 - to guide your design.