What's up Python? Django gets background tasks, a new Python REPL, bye bye gunicorn...
June, 2024
Summary
Built-in background tasks incoming from Django. Finally.
A wild REPL appears that doesn't require to install anything in a venv.
Uvicorn gets multiprocessing support, ditching gunicorn.
Pypi is blocking outlook.com emails. Sorry.
Built-in background tasks incoming from Django
Django has been in the process of modernizing during the last few years. It's moving more and more parts of the framework to be compatible with asyncio, it's adding hooks to use modern SQLite features, and now we got DEP (Django Enhancement Proposal, the PEP equivalent for this project) 14 proposing to include built-in background tasks.
And it's about time.
Django is typically used in multiprocessing settings, which in Python means it's hard to delegate background tasks to a pool without saturating userspace with even more processes. So at best, you could use threads, which is not ideal in the language. That's an annoying limitation of the platform, which means any serious project will eventually setup a task queue. To send emails, to batch process, to make HTTP requests to API, etc.
Hence the popularity of celery, dramatiq, rq, huey, etc. I use the latter.
But you still have to make sure you don't lock your DB by mistake, be careful to not pass full ORM objects in params, figure out integration and how to get back task results and errors, setup task progress by yourself, make hard choices on transcience, decide if you need priority and named queues, etc.
If Django has been good at one thing in the past, it's to provide sane default for hard topics like CSRF protection, auth or database connections. So I love that they go in that direction for task queues.
As usual, they don't start from scratch but will integrate an existing external project that served as a proof of concept. In this case, django-tasks, which supports multiple queues, multiple backends, task priority, result storage, and immediate mode for testing. Unfortunately it also only ships a DB backend for prod at the moment, which means either you deploy a full DBMS, or you are limited to SQLite and a single worker. Given redis now has official support in Django, this will probably change. Also, if WAL and a few tweaks gets added to default SQLite, it could be used with more workers. Given the proposal initially came from the wagtail community, a bunch of really active and pragmatic devs, I'm following the story with high interest.
So what does it look like?
You write a few settings in the typical Django fashion:
TASKS = {
"test": {
"BACKEND": "django.tasks.backends.ImmediateBackend",
"QUEUES": []
"OPTIONS": {}
},
"default": {
"BACKEND": "django.tasks.DatabaseBackend",
"QUEUES": []
"OPTIONS": {}
}
}
You define your task (probably in tasks.py in your app):
from django.tasks import task
@task()
def do_something_slow(param1, param2) -> int:
...
And you send that to the queue from your view (Django’s name for an endpoint):
# Send the task to the default backend
result = do_a_task.enqueue("value1", param2="value2")
# Send with a specific priority or backend
result = do_a_task.using(priority=10, backend="test").enqueue()
I do expect plenty of issues in the first version, though. The initial PoC is simplistic and queues are full of dragons. It doesn't have a way to cron task, and the debate to have it on v1 is still ongoing. I see no high-level API to fetch a task result once you lose reference to that result object, a glaring omission you can feel pain from in some other tools. How it's going to let you debug a crashed task is still not resolved. Also, no mention of transcience, retry, ACK, and all those nasty little buggers that are difficult to nail, and super context-specific.
Baby steps, I say.
And yes, I know, we wouldn't have this problem in Go, Erlang, Rust, Java, C#, you name it.
But we do. And so does Ruby, PHP and JS. Laravel has a solution to the problem, it's a good thing we follow. The Python ecosystem gains a lot by adopting goodies from other languages. I'm still hoping we steal object deconstructing from JS.
Of course, I invite you to be part of the testing and the discussion, that's how it will become good. And thanks to the devs that will support this heavy project.
A wild REPL appears
Recently I posted about about jupyterlab-desktop, a stand-alone version of the excellent jupyter that solves several problems of the original tool:
It doesn't need you to create a venv to be installed.
It lets you manage several venvs after the fact.
It hides the logic of running a server from you and just works.
It starts with a click on an icon and not a terminal command.
It's a big improvement to the ergonomics of a tool that many beginners use.
But it still have a one major flaw: you need to install jupyterlab in each venv you use it with. It's a big problem because jupyter has a lot of dependencies. It pulls 645 packages! And none of that is something you want in prod, but the delta with your dev machine and your server becomes huge.
Today the VSCode teams announced a new Python REPL that you can activate by putting "python.REPL.sendToNativeREPL": true
into your settings.json.
For now, it's a MVP, so it's not the easiest thing to start, you have to select a Python line and type Shift+Enter.
But it already provides:
Venv integration.
Multi line edition.
Code coloration, completion and snippets.
Many intellisense features.
And above all: it installs NOTHING in your venv to work.
It's not a full jupyterlab substitute. Rather an ipython-qtconsole on steroid. But that's how a lot people use jupyter. You'll need a few things to make it really practical, like a button to open it, inline plot display, some shortcut to get a cell content back into the prompt... Still, it's already quite nice.
Uvicorn gets multiprocessing support
To run WSGI frameworks like Django and Flask, we often use gunicorn, a server that stands between your code and whatever big name like Nginx or Apache you choose. Because WSGI is synchronous, it spawns multiple processes to handle HTTP requests in parallel.
The equivalent of the asynchronous world, used by FastAPI and the likes, is uvicorn, which runs in a single process, using asyncio to handle parallelism.
If you wanted both, you have to use gunicorn to manage uvicorn, which loads your code, the whole thing being served by your reverse proxy. That's a lot of intermediaries.
And that's for prod, for dev, projects may have a separate server.
With version 0.30, uvicorn brings a manager supporting multiple processes, meaning they can ditch gunicorn as a dep. I think the whole community will likely slowly, but surely, move to uvicorn for everything. Well, many not Django, since they hate external deps.
Indeed uvicorn can now spawn multiple processes, each of them handling async requests (including websockets and HTTPS), and auto-reload on file change. All that can be pretty fast, since you can pip install uvloop
to make it use node JS event loop implementation to get a speed boost.
Hell, give it TLS certificate generation and it could even play the game of caddy and own the whole stack one day.
My favorite part of the release note:
Nothing needs to be done from the users side, the changes are already in place when using the --workers parameter.
Pypi is blocking outlook.com emails
I'm just going to quote, Ee Durbin the PyPI Director of Infrastructure, since I won’t say it any better:
In response to ongoing mass bot account registrations, Outlook domains outlook.com and hotmail.com have been prohibited from new associations with PyPI accounts. This includes new registrations as well as adding as additional addresses. [...] In a campaign today which included over 160 projects and associated new user registrations, the accounts were registered using outlook.com and hotmail.com email addresses. Past campaigns of similar scale have had similar characteristics. This indicates to us that the Outlook email services are falling short of other major email providers in prevention of automated, bot, and bulk signups for new accounts.
It's easy to forget that Pypi is constantly under attack and requires tremendous maintenance effort at a great cost. I make it one of my missions to regularly remind people to not take all those wonderful gifts for granted. We are living on the shoulders of FOSS generous giants.
That won't make it easier to swallow for the legitimate people with outlook.com emails, though. I know the struggle, I have a ".email" tld on my personal address and get rejected by services sometimes.
I trust they will revert it as soon as it's practical for them:
We hope that this change does not need to be permanent, given our current capacity for response and tooling it is the next step that we currently have.
Thank you for this recap. I wasn't aware of dramatiq 😅
The field of background tasks is large and overwhelming. Hope Django will provide us a decent solution in one or two years