Doit: the goodest python task-runner you never heard of

"Just"

Apr 22, 2023

Summary

doit is a sweet python based task runner and optionally, a decent build system. It scales down, making simple things simple:

def task_print_uuid():
    return {
        "actions": [
            "grep UUID /etc/fstab | cut -d' ' -f1 | cut -d'=' -f2",
            "echo done"
        ],
        "verbosity": 2
    }

But it also scales up. Should the need arise, it’s packing files and tasks dependencies, a DAG with a cache, allowing you to define parameters for each task and even generate tasks on the fly.

doit for you

The more projects you work on, the more you can enjoy tooling. But with a lot of tooling comes to need to abstract those. This is why task runners and build systems multiplied like blog engines, only before blogs were invented.

Now you have the choice between make, cmake, gulp, ninja, maven, gradle, bazel and so numerous others.

I tried many and always felt frustrated: some of those tools are not cross platform, some are very hard to setup, some don't deal with dependencies, some are heavy and some have a weird DSL.

It's like nothing was at the right level of compromise: either too low level, or too abstracted away, too big or too small, to simple or too complicated, too niche or too corporate. I'm not google, but I'm not a student either.

Is there anything for an honest coder-plumber?

And one day I tried one more: pydoit.org.

It's not perfect, but it's productive.

It's not overpowered, but it can do the daily stuff.

It's not Usain Bolt, but it gets there in time.

So I kept at it.

Despite the numerous quirks of this little tool, it has less dark corners than the competition, and was consistently a boost to each project I introduced it to. Cost was minimal. Colleagues picked it up without much fuzz.

It didn't win a prize for the most amazing tool in the world.

But it's nice.

Try it.

What does doit do?

Mostly, it's a cli tool to run stuff.

You create a task, you give it a name plus a bunch of actions. The action are commands and/or Python functions to run. Then when you call the name, they run.

At first, you use it because, who remember all the options you need to pass to sphinx to build the doc? Or because it's easier than finding out how to setup the PYTEST_ADDOPTS for the whole team. Or because you have a different port for each project dev servers and you are fed up to have to fiddle to find it back every time you switch.

So you create a "build_doc", "test", and "dev_server" commands.

Then you are pleased to realize all your projects with different stacks have the exact same workflow now. This is nice.

Then you start stacking commands into commands. How about "build" calling "test", "build_doc", then "make_package"?

Ah, the CI has become simpler, it just installs stuff and call doit now.

Then you are realizing you wish you would not always be building the doc when you make the package. Sometimes the doc hasn't changed. But it turns out you can add a few lines to set dependencies on changed files.

Pure Python syntax, no PHONY to forget, no compiler-related flags, works on windows, mac and linux… There is a lot to like here.

Doing it

I would say, just "pip install doit", but if you have read the blog before you know I would rather say to activate your virtual environment and run "python -m pip install doit".

Now create a "dodo.py" file at the root of your project. Yes, "dodo.py". Don't ask.

And put the following in it:

def task_hello():
    return {
        "actions": ['echo hello'],
    }

And now run:

python -m doit hello

You will get:

python -m doit hello
.  hello

Congrats, you just did it. You created and ran your first task.

I encourage to use "python -m doit" just in case your PATH is all broken, but if you are in a virtual environment, calling "doit" alone is usually find.

So for the rest of the tutorial, I will simply use "doit" instead of "python -m doit".

Scaling down

I like systems that scale up, but also scale down. Things that I can use with my clients, but also on a dumb personal project.

doit does this very well: if all what you want is to run a bunch of commands, you don't need much:

def task_do_stuff():
    return {
        "actions": [
            'command one',
            'command two',
            'command three'
        ],
    }

It will run it your system shell, so you can use the syntax you are used to. It even deals with pipes transparently.

By default, it doesn't show the output, and just run the things. But you set verbosity to see what's happening:

def task_print_uuid():
    return {
        "actions": [
            "grep UUID /etc/fstab | cut -d' ' -f1 | cut -d'=' -f2",
            "echo done"
        ],
        "verbosity": 2
    }

Then you can call “doit the_task_name”:

doit print_uuid
.  hello
10654268-0809-4f8f-8f8e-e1a2811e1688
7C68-250E
23fa1a22-ab1f-418d-b00f-363ca8f92261
69306B771D56D4AA
done

Of course, I'm on Linux, so this is bash. But on windows you can run cmd.exe commands.

However, what about having something cross platform?

Well, you can pass Python functions as actions:

import datetime as dt

def now():
    print(dt.datetime.now())

def task_print_now():

    return {
        "actions": ["echo you can mix", now, "echo done"],
        "verbosity": 2
    }

It's based on conventions. Everything prefixed with "task_" is a task, everything else is a regular function. And you can use any regular function as an action.

This gives you:

doit print_now
.  print_now
you can mix
2023-04-21 07:34:44.499141
done

Tasks can depend on other tasks, so you can compose:

def task_uuid_and_now():
    return {
        # names of the tasks as strings, without the "task_" prefix
        "task_dep": ["print_uuid", "print_now"],
        "actions": ["echo done and done"],
        "verbosity": 2,
    }

doit uuid_and_now
.  print_uuid
10654268-0809-4f8f-8f8e-e1a2811e1688
7C68-250E
23fa1a22-ab1f-418d-b00f-363ca8f92261
69306B771D56D4AA
done
.  print_now
you can mix
2023-04-21 07:40:18.392034
done
.  uuid_and_now
done and done

The default behavior in case of failure is sane, it will not, like in a shell script, just carry on:

def task_fail():
    return {"actions": ["echo start", "%£*,;@f", "echo never reached"], "verbosity": 2}

doit fail
.  fail
start
TaskError - taskid:fail
CmdAction Error creating command string
Traceback (most recent call last):
  File "/home/user/.local/lib/python3.10/site-packages/doit/action.py", line 201, in execute
    action = self.expand_action()
  File "/home/user/.local/lib/python3.10/site-packages/doit/action.py", line 315, in expand_action
    return self.action % subs_dict
ValueError: unsupported format character '?' (0xa3) at index 1

If you want to run a long-running command, like a shell or a server, you can mark one particular command so that doit doesn't expect it to stop and let you type in prompts:

from doit.tools import Interactive


def task_python():
    return {
        "actions": [
            "echo this is useless but ok",
            Interactive("python3.10"),
        ],
        "verbosity": 2,
    }

doit python
.  python
this is useless but ok
Python 3.10.11 (main, Apr  5 2023, 14:15:10) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print('hello')
hello

With just those basic things, you can go very far with doit. Most of my use cases revolve around this.

Scaling up

At some point, it's nice to have something that can step up to do some serious work. It turns out doit can be a decent build system too, as it supports parameters, multi-processing and has a nice DAG between tasks and files.

First, doit has an optional, yet fully functional system of file dependencies and target:

from pathlib import Path

def task_build_doc():

    return {
        'file_dep': [*Path('docs').rglob('*.rst')],
        'actions': ["python -m sphinx-build -b html docs build/html"],
        'targets': ["build/html/index.html"],
        }

This will only run sphinx-build if any RST files have changed, or if index.html is missing. Because doit is using Python instead of a DSL, we can use pathlib and unpacking to create the list of things we want to match. All the python tools are at our disposal.

Internally, doit creates a directed acyclic graph of all the files + tasks each task depends on. Which means you get tree shaking of the whole tree of tasks for free, only running what's necessary. This is all cached (by default in a json file, but sqlite is an option) so the performances are quite decent. Not ripgrep level, but very nice for a Python tool. And if none of that suits you, you can pass a custom "should-it-run?" function.

When you start to have a lot of things to run, you can tell doit to automatically dispatch all commands through as many processes as you like.

E.G., this will run all the tasks from uuid_and_now, dispatched across 3 processes:

doit -n 3 uuid_and_now

It's possible to run threads instead of multiple processes in case you have variables you want to share easily, or because you care about I/O but don't have a lot of cores to spare:

doit -n 3 -P thread uuid_and_now

All parameters to doit must be passed right after doit, not after the task name. This is because you can define parameters for each of your tasks;

You can use positional arguments in commands:

def task_format():
    return {
        "actions": [
            "python -m black '%(file_to_format)s'",
            "python -m isort '%(file_to_format)s'",
        ],
        # the value of pos_arg can be any name you want to use in the commands
        "pos_arg": "file_to_format",
    }

doit format dodo.py
.  format
All done! ✨ 🍰 ✨
1 file left unchanged.

You can also use it in Python functions:

import hashlib
from pathlib import Path
def task_hash():

    def print_hashes(files_to_hash): # files_to_hash is a list
        for path in files_to_hash:
            data = Path(path).read_bytes()
            hex_hash = hashlib.new("blake2s", data).hexdigest()
            print(f"{path}: {hex_hash}")

    def done():
        print("I don't receive the param and I'm fine")

    return {
        "actions": [
            # Don't call the function, just pass the name
            print_hashes, done
        ],
        "pos_arg": "files_to_hash",
        "verbosity": 2
    }

doit hash *.py *.log
.  hash
dodo.py: dc5e4c8a0101ef3c404a8c27352bdd6c6514a7a44ffffe18aac5de4d27d4ee1ffd78fef16340e30341b66989a63e5d0dedee6c8bcbce9fd188a61a20456b324b
psync_err.log: cc4434d9b182a9b9366e3271990843de0fb88499df8aebc01da1f4ce96ec426b7f25bee454e4913f47571da1e3588728333c5e0d2c788f28ba0239692fd9c79
I don't receive the param and I'm fine

Needing-less to say, doit accepts options and flags as well:

def task_hash():

    # hash_type contains the value of the param "--algo", and can be
    # use in any action, be it a Python function or a shell command

    def print_hashes(files_to_hash, hash_type):
        for path in files_to_hash:
            data = Path(path).read_bytes()
            hex_hash = hashlib.new(hash_type, data).hexdigest()
            print(f"{path}: {hex_hash}")

    def done(print_done):
        print('I receive only print_done, but doit handles that')
        if print_done:
            print("Done")

    hash_names = [n.upper().replace("_", " ") for n in hashlib.algorithms_guaranteed]
    possible_hashes = list(zip(hash_names, hashlib.algorithms_guaranteed))

    return {
        "actions": ["echo Using %(hash_type)s", print_hashes, done],
        "pos_arg": "files_to_hash",
        "verbosity": 2,
        "params": [
            {
                "name": "hash_type",  # the param to pass to Python function
                "short": "a",  # allow -a to pass the option
                "long": "algo",  # allow --algo to pass the option
                "default": "blake2s",
                # you can even optionally restrict the acceptable values
                # and types if you feel fancy
                "choices": possible_hashes,
                "type": str,
            },
            {
                "name": "print_done",
                "short": "d",
                "long": "print-done",
                "default": False,
                # if Type is bool, doit makes it a flag instead of an option
                "type": bool,
            },
        ],
    }

doit hash --print-done --algo md5 *.py *.log
.  hash
Using md5
dodo.py: cf0f99e136829084ccc691f04b840e28
psync_err.log: 3a2d543c402d167cd2a9efb49936ca1f
I receive only print_done, but doit handles that
Done

Options parsing is a bit finicky. You can only pass them before positional arguments, not after.

When your project grows, those features come in handy, but you don't have to use them. They are completely opt-in.

Some are even more niche, like "--report json" that will output something you can parse when you run a command:

doit --report json print_now
{"tasks": [{"name": "print_now", "result": "success", "out": "you can mix\n\n<------------------------------------------------>\n2023-04-21 09:59:39.231405\n\n<------------------------------------------------>\ndone\n", "err": "", "error": null, "started": "2023-04-21 07:59:39.230669", "elapsed": 0.001270294189453125}], "out": "you can mix\n2023-04-21 09:59:39.231405\ndone\n", "err": ""}$

There are more goodies in the doc. The point is, doit is one of those tools that has the nice property of being like Python: it makes simple things simple, and complicated things possible. It's not the best at anything. Options parsing is not the most flexible, you have to remember to set the verbosity if you want to print, etc. But it's quite good at a lot of things.

Tips and tricks

list and help

You can list all available commands for one project by running:

doit list
build_doc
hash
format

If you can get help on any command as well:

doit help hash
hash
  -a ARG, --algo=ARG
                            choices:
                            BLAKE2B: blake2b
                            BLAKE2S: blake2s
                            MD5: md5
                            SHA1: sha1
                            SHA224: sha224
                            SHA256: sha256
                            SHA3 224: sha3_224
                            SHA3 256: sha3_256
                            SHA3 384: sha3_384
                            SHA3 512: sha3_512
                            SHA384: sha384
                            SHA512: sha512
                            SHAKE 128: shake_128
                            SHAKE 256: shake_256 (config: hash_type)
  -d, --print-done            (config: print_done)

If you provide a docstring for the task, it will appear in the help.

Change a few default settings

You can DOIT_CONFIG as the top of your dodo.py file to apply some parameters to all tasks. I like to use the following:

DOIT_CONFIG = {
    # if you type "doit", print usage
    "default_tasks": [""],
    # use sqlite for caching instead of json
    # you may have to delete .doit.db before
    "backend": "sqlite3",
    # always print everything
    "verbosity": 2
}

`Sending parameters to the underlying command`

For commands such as pytest, it's convenient to be able to pass options not to the doit task, but directly to pytest iteself:

 
def task_test():
    return {
        "actions": ["pytest tests %(pass_through)s"],
        "pos_arg": "pass_through",
    }

But if you just call:

doit test --ff

--ff won't be passed to pytest as expected because doit will think it's an option to parse and will tell you:

ERROR: Invalid parameter: "ff". Must be a command, task, or a target.

For it to work, you must put everything behind "--":

doit test -- --ff
.  test

This is shell jargon to say "everything after the -- is positional even if it looks like an option". It's not specific to doit, but it's a good tip.

Now, everything you pass after -- will be passed directly to pytest, without being analyzed by doit.

Anecdotally, you can run several doit commands in one call using:

doit task1 task1

But this doesn't work with positional parameters, so I recommend to just always run one command at a time.

Generating a command on the fly

Any string is a valid action as long as it represents a command you can run. So you can generate a command dynamically:

import datetime as dt
def task_run_on_monday():

    action = "echo nothing to do"
    if dt.date.today().weekday() == 0:
        action = "curl -X POST https://tpsreport.com/delay/week/1"

    return {
        "actions": [action]
    }

Parameters are only passed to actions, not to your "task_" function. But what if you want to create a different command depending on the parameter?

You can opt in to have a parameter passed to your "task_" function instead:

from doit import task_params
import datetime as dt

@task_params([{"name": "force_delay", "default": False, "long": "force-delay", "type": bool}])
def task_run_on_monday(force_delay):

    action = "echo nothing to do"
    if dt.date.today().weekday() == 0 or force_delay:
        action = "curl -X POST https://tpsreport.com/delay/week/1"

    return {
        "actions": [action]
    }

You can now do:

doit run_on_monday --force-delay

To delay the TPS reports even if it's not Monday.

Avoid putting costly logic in "task_" functions. All of them run every time you call "doit", because "doit" needs to create the graph of dependencies. Put costly logic in the actions themselves, which run only when the task name is passed to doit.

E.G:

doit format

This runs ALL "task_" functions, but only the actions of task_format.

If the logic to generate a command is costly, put it in a function, and wrap it in CmdAction:

import datetime as dt
from doit.action import CmdAction
import time
def task_run_on_monday():

    def generate_action():
        time.sleep(5) # you don't want this to run for every command!
        if dt.date.today().weekday() == 0:
            return = "curl -X POST https://tpsreport.com/delay/week/1"
        return  "echo nothing to do"

    return {
        "actions": [CmdAction(generate_action)]
    }

This will only call generate_action if run_on_monday it is called, but will still dynamically generate the command.

Interactive() but for Python functions

I talked about Interactive it before, but it's a very important class.

Because doit swallows stdin/stdout, things like output colors will be stripped. So be very liberal with your use of Interactive. Better have it for nothing than not having it when you need it. But it only works for actions that are commands, not python functions.

However, you will notice that in your Python function as well, you need this feature. E.G.: pdb.set_trace() or breakpoint() will disappear and hang the task because you don't have access to stdin/stdout.

For those reasons, there is PythonInteractiveAction:

import hashlib
from pathlib import Path

from doit.tools import PythonInteractiveAction
def task_hash():

    def print_hashes(files_to_hash): # files_to_hash is a list
        for path in files_to_hash:
            data = Path(path).read_bytes()
            hex_hash = hashlib.new("blake2s", data).hexdigest()
            print(f"{path}: {hex_hash}")

    def done():
        print("I don't receive the param and I'm fine")

    return {
        "actions": [
            PythonInteractiveAction(print_hashes), done
        ],
        "pos_arg": "files_to_hash",
        "verbosity": 2
    }

Now you can use a break points in print_hashes.

A real life dodo file

I put this one on a gist so you have a bit of color to look at. You can ignore the header of warnings.

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters

Show hidden characters

	import json
	import secrets
	import string
	from pathlib import Path

	import doit
	from doit import tools
	from doit.tools import Interactive, LongRunning
	from fabric import Connection

	from my_project.website.crawler import run_crawler
	from my_project.website.utils import setup_var_directories

	BASE_DIR = Path(__file__).absolute().parent
	DIST_DIR = "./var/dist/"
	BUILD_DIR = "./var/tmp/shiv/"

	DOIT_CONFIG = {
	"default_tasks": [""],
	"backend": "sqlite3",
	# note to readers: This allows me to use a better format syntax
	# for params instead of %(param_name)s
	"action_string_formatting": "new",
	}


	def test_setup_var_directories():
	return {
	"actions": [
	setup_var_directories,
	],
	}


	def task_lock_deps():
	"""Lock dependancies using poetry, and export them as a requirements files"""

	return {
	"file_dep": ["pyproject.toml"],
	"actions": [
	"poetry lock",
	"poetry export --without-hashes -f requirements.txt --output requirements.txt",
	"poetry export --without-hashes --dev -f requirements.txt --output all-requirements.txt",
	"grep -Fvxf requirements.txt all-requirements.txt > dev-requirements.txt",
	"rm all-requirements.txt",
	],
	"targets": ["requirements.txt", "dev-requirements.txt", "poetry.lock"],
	}


	def task_precommit():
	"""Lint, format and test before commit"""

	return {
	"file_dep": [Path("my_project").rglob("/.py")],
	"actions": [
	"black my_project tests",
	"pylint my_project tests",
	"mypy my_project tests",
	"pytest tests",
	],
	}


	def task_bundle_static_files():

	bundle = str(BASE_DIR / "var/static/bundle.js")
	static_files = Path("my_project/website/static/")
	deps = [
	static_files.rglob("/.js"),
	static_files.rglob("/.css"),
	Path("my_project/frontend/src/").rglob("/.vue"),
	]

	def update_manifest_json():
	# Because of a bug in vite, we can't use main.js as an entry point
	# and must use a fake index.html. But a script can't load html files.
	# So we do a little hack, duplicate the manifest index.html entry, but
	# name the copy main.js

	conf = Path("var/static/manifest.json")
	data = json.loads(conf.read_text())
	data["./src/main.js"] = data["index.html"]
	conf.write_text(json.dumps(data))

	return {
	"file_dep": deps,
	"actions": [
	"mkdir -p ./var/static/",
	"rm -fr ./var/static/* ",
	"cd my_project/frontend/; npm run build",
	"python manage.py collectstatic --noinput",
	update_manifest_json,
	"cp -r ./var/static/* ",
	],
	"targets": ["var/static/manifest.json"],
	}


	def task_dump_dependencies():

	return {
	"file_dep": ["pyproject.toml"],
	"actions": [
	f"mkdir -p {BUILD_DIR}",
	f"rm -fr {BUILD_DIR}*",
	f"pip install -r requirements.txt --target {BUILD_DIR}",
	],
	}


	def task_build_zipapp():

	return {
	"task_dep": ["bundle_static_files", "lock_deps", "dump_dependencies"],
	"actions": [
	# Make sure the directories are there and empty
	f"mkdir -p {DIST_DIR} ",
	f"rm -fr {DIST_DIR}*",
	f"mkdir -p {BUILD_DIR}var/",
	# Put the python project, deps and static files in there
	f"cp -r var/static/ {BUILD_DIR}var/",
	f"cp -r shiv_entry_point.py manage.py my_project {BUILD_DIR} ",
	# Build the zipapp
	f"shiv --site-packages {BUILD_DIR} --compressed -p '/usr/bin/env python3' -o {DIST_DIR}my_project.pyz -e shiv_entry_point.main",
	],
	"targets": ["{DIST_DIR}/my_project.pyz"],
	"verbosity": 2,
	}


	def upload_setup_and_restart():
	with Connection("contact@domain.tld") as c:
	print("Upload pyz")
	c.put(f"{DIST_DIR}my_project.pyz", "/opt/my_project")
	print("Setup prod")

	c.run("cd /opt/my_project && source .env && python3.9 my_project.pyz migrate_prod")

	print("Restart python process")
	c.run("sudo service my_project restart", pty=True)


	def task_deploy():

	return {
	"actions": [
	upload_setup_and_restart,
	],
	"verbosity": 2,
	}


	def task_build_and_deploy():
	return {
	"task_dep": ["build_zipapp"],
	"actions": [upload_setup_and_restart],
	"verbosity": 2,
	}


	# note to readers: LongRunning is a less costly version of Interactive
	# for processes you don't expect to interact with
	def task_serve():
	return {
	"actions": [LongRunning("python manage.py runserver")],
	}


	def task_vite():
	return {
	"actions": [LongRunning("cd my_project/frontend; npm run dev")],
	}


	def task_crawl():

	return {
	"actions": [LongRunning(run_crawler)],
	"verbosity": 2,
	}


	def task_shell():
	return {
	"actions": [Interactive("python manage.py shell_plus")],
	}


	def task_load_test_users():

	return {
	"actions": [
	"python ./manage.py loaddata tests/fixtures/users.json",
	],
	}


	def task_generate_password():
	def gen():
	alphabet = string.ascii_letters + string.digits
	password = "".join(secrets.choice(alphabet) for i in range(20))
	return {"password": password}

	return {"actions": [gen]}


	def task_create_pg_db():

	return {
	"actions": [
	Interactive('sudo -u postgres psql -c "create database {db}"'),
	Interactive(
	"""sudo -u postgres psql -c "create user {db} with encrypted password '{password}'" """
	),
	Interactive(
	'sudo -u postgres psql -c "grant all privileges on database {db} to {db}"'
	),
	"echo Database '{db}' created for user '{db}' with password '{password}'",
	],
	"params": [{"name": "db", "default": "my_project", "long": "db"}],
	"getargs": {
	"password": ("generate_password", "password"),
	},
	"verbosity": 2,
	}

view raw dodo.py hosted with ❤ by GitHub


fdsq

Bite code!

Discussion about this post