Doit: the goodest python task-runner you never heard of
"Just"
UPDATE: if you don’t need a full build system but rather only a task runner, just is what I would recommend over Doit nowadays. However, Doit is still a good option if you need a DAG with dependency checks or if having something in Python is right for you.
Summary
Doit is a sweet Python-based task runner and, optionally, a decent build system. It scales down, making simple things simple:
def task_print_uuid():
return {
"actions": [
"grep UUID /etc/fstab | cut -d' ' -f1 | cut -d'=' -f2",
"echo done"
],
"verbosity": 2
}But it also scales up. Should the need arise, it’s packing files and tasks dependencies, a DAG with a cache, allowing you to define parameters for each task and even generate tasks on the fly.
doit for you
The more projects you work on, the more you can enjoy tooling. But with a lot of tooling comes the need to abstract those. This is why task runners and build systems multiplied like blog engines, only before blogs were invented.
Now you have the choice between make, cmake, gulp, ninja, maven, gradle, bazel and so many others.
I tried a lot of them and always felt frustrated: some of those tools are not cross-platform, some are very hard to set up, some don't deal with dependencies, some are heavy, and some have a weird DSL.
It's like nothing was at the right level of compromise: either too low level, or too abstracted away, too big or too small, too simple or too complicated, too niche or too corporate. I'm not Google, but I'm not a student either.
Is there anything for an honest coder-plumber?
And one day I tried one more: pydoit.org.
It's not perfect, but it's productive.
It's not overpowered, but it can do the daily stuff.
It's not Usain Bolt, but it gets there in time.
So I kept at it.
Despite the numerous quirks of this little tool, it has fewer dark corners than the competition and was consistently a boost to each project I introduced it to. Cost was minimal. Colleagues picked it up without much fuss.
It didn't win a prize for the most amazing tool in the world.
But it's nice.
Try it.
What does Doit do?
Mostly, it's a CLI tool to run stuff.
You create a task, you give it a name, plus a bunch of actions. The actions are commands and/or Python functions to run. Then, when you call their name, they run.
At first, you use it because who remembers all the options you need to pass to sphinx to build the doc? Or because it's easier than finding out how to set up PYTEST_ADDOPTS for the whole team. Or because you have a different port number for each project dev server, and you are fed up with having to fiddle to find it back every time you switch.
So you create a "build_doc", "test", and "dev_server" commands.
Then you are pleased to realize that all your projects with different stacks have the exact same workflow now. This is nice.
Then you start stacking commands into commands. How about "build" calling "test", "build_doc", then "make_package"?
Ah, the CI has become simpler, it just installs stuff and calls doit now.
Then you realize you wish you were not always be building the doc when you make the package. Sometimes the doc hasn't changed. But it turns out you can add a few lines to set dependencies on changed files.
Pure Python syntax, no PHONY to forget, no compiler-related flags, works on Windows, Mac, and Linux… There is a lot to like here.
Doing it
I would say, just "pip install doit", but if you have read the blog before you know I would rather say to activate your virtual environment and run "python -m pip install doit". Or use uv.
Now create a "dodo.py" file at the root of your project. Yes, "dodo.py". Don't ask.
And put the following in it:
def task_hello():
return {
"actions": ['echo hello'],
}And now run:
python -m doit helloYou will get:
python -m doit hello
. helloCongrats, you just did it. You created and ran your first task.
I encourage you to use "python -m doit" just in case your PATH is all broken, but if you are in a virtual environment, calling "doit" alone is usually fine. If it’s too long, you can always alias it in your shell as well.
So for the rest of the tutorial, I will simply use "doit" instead of "python -m doit".
Scaling down
I like systems that scale up, but also scale down. Things that I can use with my clients, but also on a dumb personal project.
doit does this very well: if all you want is to run a bunch of commands, you don't need much:
def task_do_stuff():
return {
"actions": [
'command one',
'command two',
'command three'
],
}It will run in your system shell, so you can use the syntax you are used to. It even deals with pipes transparently.
By default, it doesn't show the output, and just runs the things. But you set verbosity to see what's happening:
def task_print_uuid():
return {
"actions": [
"grep UUID /etc/fstab | cut -d' ' -f1 | cut -d'=' -f2",
"echo done"
],
"verbosity": 2
}Then you can call “doit the_task_name”:
doit print_uuid
. hello
10654268-0809-4f8f-8f8e-e1a2811e1688
7C68-250E
23fa1a22-ab1f-418d-b00f-363ca8f92261
69306B771D56D4AA
doneOf course, I'm on Linux, so this is bash. But on Windows, you can run cmd.exe commands.
However, what about having something cross-platform?
Well, you can pass Python functions as actions:
import datetime as dt
def now():
print(dt.datetime.now())
def task_print_now():
return {
"actions": ["echo you can mix", now, "echo done"],
"verbosity": 2
}It's based on conventions. Everything prefixed with "task_" is a task, everything else is a regular function. And you can use any regular function as an action.
This gives you:
doit print_now
. print_now
you can mix
2023-04-21 07:34:44.499141
doneTasks can depend on other tasks, so you can compose:
def task_uuid_and_now():
return {
# names of the tasks as strings, without the "task_" prefix
"task_dep": ["print_uuid", "print_now"],
"actions": ["echo done and done"],
"verbosity": 2,
}doit uuid_and_now
. print_uuid
10654268-0809-4f8f-8f8e-e1a2811e1688
7C68-250E
23fa1a22-ab1f-418d-b00f-363ca8f92261
69306B771D56D4AA
done
. print_now
you can mix
2023-04-21 07:40:18.392034
done
. uuid_and_now
done and doneThe default behavior in case of failure is sane; it will not, like in a shell script, just carry on:
def task_fail():
return {"actions": ["echo start", "%£*,;@f", "echo never reached"], "verbosity": 2}doit fail
. fail
start
TaskError - taskid:fail
CmdAction Error creating command string
Traceback (most recent call last):
File "/home/user/.local/lib/python3.10/site-packages/doit/action.py", line 201, in execute
action = self.expand_action()
File "/home/user/.local/lib/python3.10/site-packages/doit/action.py", line 315, in expand_action
return self.action % subs_dict
ValueError: unsupported format character '?' (0xa3) at index 1If you want to run a long-running command, like a shell or a server, you can mark one particular command so that doit doesn't expect it to stop and let you type in prompts:
from doit.tools import Interactive
def task_python():
return {
"actions": [
"echo this is useless but ok",
Interactive("python3.10"),
],
"verbosity": 2,
}doit python
. python
this is useless but ok
Python 3.10.11 (main, Apr 5 2023, 14:15:10) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print('hello')
helloWith just those basic things, you can go very far with doit. Most of my use cases revolve around this.
Scaling up
At some point, it's nice to have something that can step up to do some serious work. It turns out doit can be a decent build system too, as it supports parameters, multi-processing, and has a nice DAG between tasks and files.
First, doit has an optional, yet fully functional system of file dependencies and target:
from pathlib import Path
def task_build_doc():
return {
'file_dep': [*Path('docs').rglob('*.rst')],
'actions': ["python -m sphinx-build -b html docs build/html"],
'targets': ["build/html/index.html"],
}This will only run sphinx-build if any *.rst files have changed, or if index.html is missing. Because doit is using Python instead of a DSL, we can use pathlib and unpacking to create the list of things we want to match. All the Python tools are at our disposal.
Internally, doit creates a directed acyclic graph of all the files + tasks each task depends on. Which means you get tree shaking of the whole tree of tasks for free, only running what's necessary. This is all cached (by default in a JSON file, but sqlite is an option), so the performance is quite decent. Not ripgrep level, but very nice for a Python tool. And if none of that suits you, you can pass a custom "should-it-run?" function.
When you start to have a lot of things to run, you can tell doit to automatically dispatch all commands through as many processes as you like.
E.G., this will run all the tasks from uuid_and_now, dispatched across 3 processes:
doit -n 3 uuid_and_nowIt's possible to run threads instead of multiple processes in case you have variables you want to share easily, or because you care about I/O but don't have a lot of cores to spare:
doit -n 3 -P thread uuid_and_nowAll parameters to doit must be passed right after doit, not after the task name. This is because you can define parameters for each of your tasks.
You can use positional arguments in commands:
def task_format():
return {
"actions": [
"python -m black '%(file_to_format)s'",
"python -m isort '%(file_to_format)s'",
],
# the value of pos_arg can be any name you want to use in the commands
"pos_arg": "file_to_format",
}doit format dodo.py
. format
All done! ✨ 🍰 ✨
1 file left unchanged.You can also get params right into Python functions:
import hashlib
from pathlib import Path
def task_hash():
def print_hashes(files_to_hash): # files_to_hash is a list
for path in files_to_hash:
data = Path(path).read_bytes()
hex_hash = hashlib.new("blake2s", data).hexdigest()
print(f"{path}: {hex_hash}")
def done():
print("I don't receive the param and I'm fine")
return {
"actions": [
# Don't call the function, just pass the name
print_hashes, done
],
"pos_arg": "files_to_hash",
"verbosity": 2
}doit hash *.py *.log
. hash
dodo.py: dc5e4c8a0101ef3c404a8c27352bdd6c6514a7a44ffffe18aac5de4d27d4ee1ffd78fef16340e30341b66989a63e5d0dedee6c8bcbce9fd188a61a20456b324b
psync_err.log: cc4434d9b182a9b9366e3271990843de0fb88499df8aebc01da1f4ce96ec426b7f25bee454e4913f47571da1e3588728333c5e0d2c788f28ba0239692fd9c79
I don't receive the param and I'm fine
Needless to say, doit accepts options and flags as well:
def task_hash():
# hash_type contains the value of the param "--algo", and can be
# use in any action, be it a Python function or a shell command
def print_hashes(files_to_hash, hash_type):
for path in files_to_hash:
data = Path(path).read_bytes()
hex_hash = hashlib.new(hash_type, data).hexdigest()
print(f"{path}: {hex_hash}")
def done(print_done):
print('I receive only print_done, but doit handles that')
if print_done:
print("Done")
hash_names = [n.upper().replace("_", " ") for n in hashlib.algorithms_guaranteed]
possible_hashes = list(zip(hash_names, hashlib.algorithms_guaranteed))
return {
"actions": ["echo Using %(hash_type)s", print_hashes, done],
"pos_arg": "files_to_hash",
"verbosity": 2,
"params": [
{
"name": "hash_type", # the param to pass to Python function
"short": "a", # allow -a to pass the option
"long": "algo", # allow --algo to pass the option
"default": "blake2s",
# you can even optionally restrict the acceptable values
# and types if you feel fancy
"choices": possible_hashes,
"type": str,
},
{
"name": "print_done",
"short": "d",
"long": "print-done",
"default": False,
# if Type is bool, doit makes it a flag instead of an option
"type": bool,
},
],
}doit hash --print-done --algo md5 *.py *.log
. hash
Using md5
dodo.py: cf0f99e136829084ccc691f04b840e28
psync_err.log: 3a2d543c402d167cd2a9efb49936ca1f
I receive only print_done, but doit handles that
DoneOptions parsing is a bit finicky. You can only pass them before positional arguments, not after.
When your project grows, those features come in handy, but you don't have to use them. They are completely opt-in.
Some are even more niche, like "--report json" that will output something you can parse when you run a command:
doit --report json print_now
{"tasks": [{"name": "print_now", "result": "success", "out": "you can mix\n\n<------------------------------------------------>\n2023-04-21 09:59:39.231405\n\n<------------------------------------------------>\ndone\n", "err": "", "error": null, "started": "2023-04-21 07:59:39.230669", "elapsed": 0.001270294189453125}], "out": "you can mix\n2023-04-21 09:59:39.231405\ndone\n", "err": ""}$There are more goodies in the doc. The point is, doit is one of those tools that has the nice property of being like Python: it makes simple things simple, and complicated things possible. It's not the best at anything. Options parsing is not the most flexible; you have to remember to set the verbosity if you want to print, etc. But it's quite good at a lot of things.
Tips and tricks
list and help
You can list all available commands for one project by running:
doit list
build_doc
hash
formatIf you can get help on any command as well:
doit help hash
hash
-a ARG, --algo=ARG
choices:
BLAKE2B: blake2b
BLAKE2S: blake2s
MD5: md5
SHA1: sha1
SHA224: sha224
SHA256: sha256
SHA3 224: sha3_224
SHA3 256: sha3_256
SHA3 384: sha3_384
SHA3 512: sha3_512
SHA384: sha384
SHA512: sha512
SHAKE 128: shake_128
SHAKE 256: shake_256 (config: hash_type)
-d, --print-done (config: print_done)
If you provide a docstring for the task, it will appear in the help.
Change a few default settings
You can DOIT_CONFIG at the top of your dodo.py file to apply some parameters to all tasks. I like to use the following:
DOIT_CONFIG = {
# if you type "doit", print usage
"default_tasks": [""],
# use sqlite for caching instead of json
# you may have to delete .doit.db before
"backend": "sqlite3",
# always print everything
"verbosity": 2
} Sending parameters to the underlying command
For commands such as pytest, it's convenient to be able to pass options not to the doit task, but directly to pytest itself:
def task_test():
return {
"actions": ["pytest tests %(pass_through)s"],
"pos_arg": "pass_through",
}But if you just call:
doit test --ff--ff won't be passed to pytest as expected because doit will think it's an option to parse and will tell you:
ERROR: Invalid parameter: "ff". Must be a command, task, or a target.For it to work, you must put everything behind "--":
doit test -- --ff
. testThis is shell jargon to say "everything after the -- is positional even if it looks like an option". It's not specific to doit, but it's a good tip.
Now, everything you pass after -- will be passed directly to pytest, without being analyzed by doit.
Anecdotally, you can run several doit commands in one call using:
doit task1 task1But this doesn't work with positional parameters, so I recommend to just always run one command at a time.
Generating a command on the fly
Any string is a valid action as long as it represents a command you can run. So you can generate a command dynamically:
import datetime as dt
def task_run_on_monday():
action = "echo nothing to do"
if dt.date.today().weekday() == 0:
action = "curl -X POST https://tpsreport.com/delay/week/1"
return {
"actions": [action]
}Parameters are only passed to actions, not to your "task_" function. But what if you want to create a different command depending on the parameter?
You can opt in to have a parameter passed to your "task_" function instead:
from doit import task_params
import datetime as dt
@task_params([{"name": "force_delay", "default": False, "long": "force-delay", "type": bool}])
def task_run_on_monday(force_delay):
action = "echo nothing to do"
if dt.date.today().weekday() == 0 or force_delay:
action = "curl -X POST https://tpsreport.com/delay/week/1"
return {
"actions": [action]
}You can now do:
doit run_on_monday --force-delayTo delay the TPS reports even if it's not Monday.
Avoid putting costly logic in "task_" functions. All of them run every time you call "doit", because "doit" needs to create the graph of dependencies. Put costly logic in the actions themselves, which run only when the task name is passed to doit.
E.G:
doit formatThis runs ALL "task_" functions, but only the actions of task_format.
If the logic to generate a command is costly, put it in a function, and wrap it in CmdAction:
import datetime as dt
from doit.action import CmdAction
import time
def task_run_on_monday():
def generate_action():
time.sleep(5) # you don't want this to run for every command!
if dt.date.today().weekday() == 0:
return = "curl -X POST https://tpsreport.com/delay/week/1"
return "echo nothing to do"
return {
"actions": [CmdAction(generate_action)]
}This will only call generate_action if run_on_monday it is called, but will still dynamically generate the command.
Interactive() but for Python functions
I talked about Interactive it before, but it's a very important class.
Because doit swallows stdin/stdout, things like output colors will be stripped. So be very liberal with your use of Interactive. Better have it for nothing than not have it when you need it. But it only works for actions that are commands, not Python functions.
However, you will notice that in your Python function as well, you need this feature. E.G.: pdb.set_trace() or breakpoint() will disappear and hang the task because you don't have access to stdin/stdout.
For those reasons, there is PythonInteractiveAction:
import hashlib
from pathlib import Path
from doit.tools import PythonInteractiveAction
def task_hash():
def print_hashes(files_to_hash): # files_to_hash is a list
for path in files_to_hash:
data = Path(path).read_bytes()
hex_hash = hashlib.new("blake2s", data).hexdigest()
print(f"{path}: {hex_hash}")
def done():
print("I don't receive the param and I'm fine")
return {
"actions": [
PythonInteractiveAction(print_hashes), done
],
"pos_arg": "files_to_hash",
"verbosity": 2
}Now you can use a break points in print_hashes.
A real-life dodo file
import json
import secrets
import string
from pathlib import Path
import doit
from doit import tools
from doit.tools import Interactive, LongRunning
from fabric import Connection
from my_project.website.crawler import run_crawler
from my_project.website.utils import setup_var_directories
BASE_DIR = Path(__file__).absolute().parent
DIST_DIR = "./var/dist/"
BUILD_DIR = "./var/tmp/shiv/"
DOIT_CONFIG = {
"default_tasks": [""],
"backend": "sqlite3",
# note to readers: This allows me to use a better format syntax
# for params instead of %(param_name)s
"action_string_formatting": "new",
}
def test_setup_var_directories():
return {
"actions": [
setup_var_directories,
],
}
def task_lock_deps():
"""Lock dependancies using poetry, and export them as a requirements files"""
return {
"file_dep": ["pyproject.toml"],
"actions": [
"poetry lock",
"poetry export --without-hashes -f requirements.txt --output requirements.txt",
"poetry export --without-hashes --dev -f requirements.txt --output all-requirements.txt",
"grep -Fvxf requirements.txt all-requirements.txt > dev-requirements.txt",
"rm all-requirements.txt",
],
"targets": ["requirements.txt", "dev-requirements.txt", "poetry.lock"],
}
def task_precommit():
"""Lint, format and test before commit"""
return {
"file_dep": [*Path("my_project").rglob("**/*.py")],
"actions": [
"black my_project tests",
"pylint my_project tests",
"mypy my_project tests",
"pytest tests",
],
}
def task_bundle_static_files():
bundle = str(BASE_DIR / "var/static/bundle.js")
static_files = Path("my_project/website/static/")
deps = [
*static_files.rglob("**/*.js"),
*static_files.rglob("**/*.css"),
*Path("my_project/frontend/src/").rglob("**/*.vue"),
]
def update_manifest_json():
# Because of a bug in vite, we can't use main.js as an entry point
# and must use a fake index.html. But a script can't load html files.
# So we do a little hack, duplicate the manifest index.html entry, but
# name the copy main.js
conf = Path("var/static/manifest.json")
data = json.loads(conf.read_text())
data["./src/main.js"] = data["index.html"]
conf.write_text(json.dumps(data))
return {
"file_dep": deps,
"actions": [
"mkdir -p ./var/static/",
"rm -fr ./var/static/* ",
"cd my_project/frontend/; npm run build",
"python manage.py collectstatic --noinput",
update_manifest_json,
"cp -r ./var/static/* ",
],
"targets": ["var/static/manifest.json"],
}
def task_dump_dependencies():
return {
"file_dep": ["pyproject.toml"],
"actions": [
f"mkdir -p {BUILD_DIR}",
f"rm -fr {BUILD_DIR}*",
f"pip install -r requirements.txt --target {BUILD_DIR}",
],
}
def task_build_zipapp():
return {
"task_dep": ["bundle_static_files", "lock_deps", "dump_dependencies"],
"actions": [
# Make sure the directories are there and empty
f"mkdir -p {DIST_DIR} ",
f"rm -fr {DIST_DIR}*",
f"mkdir -p {BUILD_DIR}var/",
# Put the python project, deps and static files in there
f"cp -r var/static/ {BUILD_DIR}var/",
f"cp -r shiv_entry_point.py manage.py my_project {BUILD_DIR} ",
# Build the zipapp
f"shiv --site-packages {BUILD_DIR} --compressed -p '/usr/bin/env python3' -o {DIST_DIR}my_project.pyz -e shiv_entry_point.main",
],
"targets": ["{DIST_DIR}/my_project.pyz"],
"verbosity": 2,
}
def upload_setup_and_restart():
with Connection("contact@domain.tld") as c:
print("Upload pyz")
c.put(f"{DIST_DIR}my_project.pyz", "/opt/my_project")
print("Setup prod")
c.run("cd /opt/my_project && source .env && python3.9 my_project.pyz migrate_prod")
print("Restart python process")
c.run("sudo service my_project restart", pty=True)
def task_deploy():
return {
"actions": [
upload_setup_and_restart,
],
"verbosity": 2,
}
def task_build_and_deploy():
return {
"task_dep": ["build_zipapp"],
"actions": [upload_setup_and_restart],
"verbosity": 2,
}
# note to readers: LongRunning is a less costly version of Interactive
# for processes you don't expect to interact with
def task_serve():
return {
"actions": [LongRunning("python manage.py runserver")],
}
def task_vite():
return {
"actions": [LongRunning("cd my_project/frontend; npm run dev")],
}
def task_crawl():
return {
"actions": [LongRunning(run_crawler)],
"verbosity": 2,
}
def task_shell():
return {
"actions": [Interactive("python manage.py shell_plus")],
}
def task_load_test_users():
return {
"actions": [
"python ./manage.py loaddata tests/fixtures/users.json",
],
}
def task_generate_password():
def gen():
alphabet = string.ascii_letters + string.digits
password = "".join(secrets.choice(alphabet) for i in range(20))
return {"password": password}
return {"actions": [gen]}
def task_create_pg_db():
return {
"actions": [
Interactive('sudo -u postgres psql -c "create database {db}"'),
Interactive(
"""sudo -u postgres psql -c "create user {db} with encrypted password '{password}'" """
),
Interactive(
'sudo -u postgres psql -c "grant all privileges on database {db} to {db}"'
),
"echo Database '{db}' created for user '{db}' with password '{password}'",
],
"params": [{"name": "db", "default": "my_project", "long": "db"}],
"getargs": {
"password": ("generate_password", "password"),
},
"verbosity": 2,
}
why no invoke?
Thanks for the great blog!
You've got a missing comma after 'command two' in the first block of code in the "Scaling down" section.