Redis and Python: 20% of effort, 80% of effects
I should rename this blog into paretocode.dev
redis is one of those tech like Sqlite or VLC, robust, works out of the box, started doing one things well, and added another, until years later it just became so good at so many things.
I love Redis.
But while it's now 15 years old, I'm not going to assume everybody knows what it is, so in short...
Redis has a lot of depth, but it can be first approached as a in-memory key/value store. Meaning as a database, but with a lot (by default all) of the data in RAM and not just on disk. And if you like Python, you can picture it as a gigantic dict, which has keys and values.
Redis is more than that, but it's good to start there, because that's what it started from. In fact, Redis stands for "Remote Dictionary Server".
Because of its design, Redis is very fast to query, not just for read, but also for writing data. By fast, I don't just mean that it has a low latency (it responds quickly) but it also can handle heavy loads (many requests at a time).
For this reason it is heavily used in the industry for caching, sharing data between processes or building tasks and message queues.
At this stage, if you never worked with it, your may be thinking that it must be super complicated to use, given how good it is.
But that's the thing.
Redis is one of the easiest tech to leverage you have ever touched in your life.
It's a god damn miracle.
Installing Redis
The only thing for which you'll have to work a little is setup.
If you are on Linux, Redis is likely part of your distro packages. E.G: can sudo apt install redis
on Ubuntu. But you'll have to do the job of finding how to install it on for yours.
For mac, it's available on homebrew and macport. Choose your poison. I may make a tutorial for beginners on them one day, but we are at the end of the week, so this is not a beginners article, and I'm assuming you have one of either and can use it.
For Windows, since Redis is a Unix tech, you'll need to set it up in WSL. Again one day I might do an article on WSL because a good half of Windows dev never heard of it, but it's not the time and place. If you really don't want to do that, there is an unofficial windows port that provides an exe. Also, if Windows prompts to allow some network connection or port, say yes.
Redis, without Python
Once the installation is done, that was the most annoying part of the deal, that's it. There is no configuration to do, no user account to setup, no password or permission to grant, no table to create.
Out of the box, Redis runs with a fresh empty db named db0, with an open port on 127.0.0.1:6379, and you can connect to it right away.
You can do so with clients from most popular languages, such as PHP, JS, Ruby, Java and of course, Python.
But there is a cli client provided with the server, which, on Unix, is exposed through the redis-cli
command. It starts a shell that lets you discuss with your redis instance:
$ redis-cli
127.0.0.1:6379>
"127.0.0.1:6379>" is the the prompt, a typical REPL like Python's, except way more basic. Everything starts with a keyword, then parameters.
To set a key "site" with the value "bitecode.dev", you do:
127.0.0.1:6379> set site "bitecode.dev"
OK
To get it back, you do:
127.0.0.1:6379> get site
"bitecode.dev"
Congrats, you know how to use Redis.
Essential Redis features
Nowadays, Redis is packed with awesome and powerful features such as a document database, streams, geospacial queries and time series. The problem is, it's a LOT.
We are not going to see any of that, and focus on the basic things it can do. Don't worry, it's already super useful. Like the title said, it's 20% of efforts, because it's easy to understand and implement, but 80% of effects because you can already use it so much everywhere.
Redis binaries are not fat enough for you to feel you are installing bloat you are not using. To give you an idea, I just cloned the full source code and compiled it manually (it's actually easy to compile, which is crazy to me), it's 112Mb. Postgres is about 180MB, for example.
So, you just saw getting and settings values in the previous. Just that is useful because you can use redis as a distributed cache.
And cache is better when you can expire the values without having to do the check yourself. Which Redis let you do. Let's tell it to expire the key "site" in 2 seconds:
127.0.0.1:6379> expire site 2
(integer) 1
127.0.0.1:6379> get site # still there
"bitecode.dev"
127.0.0.1:6379> get site # still there
"bitecode.dev"
127.0.0.1:6379> get site # gone
(nil)
With this you can already do a lot. You can share data between different WSGI workers. You can save the state of your scripts, and get them back next run. You can cache, of course, and put user session information in there for you website. For the cost of 21 µs per non optimized look up on my machine. Pretty good deal for the investment.
Now there are a few other simple yet very useful stuff you can use.
The first one are lists: one key, several values. It works like Python lists:
127.0.0.1:6379> RPUSH colors red blue yellow green pink
(integer) 5
127.0.0.1:6379> LINDEX colors 1
"blue"
127.0.0.1:6379> LINDEX colors -1
"pink"
127.0.0.1:6379> RPOP colors
"pink"
127.0.0.1:6379> RPOP colors
"green"
127.0.0.1:6379> RPOP colors
"yellow"
This is the Python equivalent of :
>>> colors = ['red', 'blue', 'yellow', 'green', 'pink']
>>> colors[1]
"blue"
>>> colors[-1]
"pink"
>>> colors.pop()
"pink"
>>> colors.pop()
"green"
>>> colors.pop()
"yellow"
Useful for all your LIFO and FIFO needs, such as queues, stacks, etc. And yeah, they can expire too.
Redis has also hashes, that act like dicts (within the giant dict):
127.0.0.1:6379> HSET user name "Alice" age "30" country "Wonderland"
(integer) 3
127.0.0.1:6379> HGETALL user
1) "name"
2) "Alice"
3) "age"
4) "30"
5) "country"
6) "Wonderland"
127.0.0.1:6379> HGET user name
"Alice"
They can expire as well. Just the entire dict though, not individual fields.
And then, since this is a Python blog, lets match all Python collections and see sets:
redis-cli
127.0.0.1:6379> SADD stuff 1 2 3 1 2 3 1 1 1 1 3 3 2
(integer) 3
127.0.0.1:6379> SMEMBERS stuff # No duplicate
1) "1"
2) "2"
3) "3"
127.0.0.1:6379> SISMEMBER stuff 3
(integer) 1
127.0.0.1:6379> SISMEMBER stuff 99
(integer) 0
There are more, of course, a lot more. Sorted sets, bitmaps, hyperlolog, pub/sub, transactions…
But you know what? Just with what we saw, you can already do a lot. In fact when I started to use it in 201*, it didn't have much more, and it changed our stack for good.
Redis from Python
The common way to use redis with Python is with the pypi package called redis-py.
You pip install it:
pip install redis
After that, you create a client instance, and you can use redis commands:
>>> from redis import Redis
>>> # decode_responses tells redis to return
>>> # strins and not bytes, which is often
>>> # what you want
>>> client = Redis(decode_responses=True)
>>> client.set('who_did_this', "python")
True
>>> client.rpush('fruits', "apple", "kiwi", "banana")
3
Which you can now read from another Python process, or even from redis-cli:
$ redis-cli
127.0.0.1:6379> get who_did_this
"python"
127.0.0.1:6379> lrange fruits 0 -1
1) "banana"
2) "kiwi"
3) "apple"
So what do you do with all this
Personally, redis is my default cached and session store for any web project. I don't even think about it anymore, I just set it blindly. It's transparent, and easy to migrate to, but also easy to migrate from if I ever want to stop.
Django has even official support for caching in redis now, and there is a good package for sessions.
That takes care of two bottlenecks for the price of 5 minutes of work, and I know it's rock solid.
I also use redis for task queues, and I'll write about it in a different article. In short, I usually go for huey first, and move from that if requirements change. It's almost never a bad start.
None of those require to know how to set and gets values, since they do all that for you.
So what do I use the commands for?
As an intermediary between file persistence, and full SQL DB. Web scrapers, deployment scripts, API mirroring... Anything that needs steps, states but I don't want to commit to a full schema.
Specific caching. Sometimes caching is more than key/value, you need order, priority or links. List, hashes, sets and sorted sets provide for that. Expiration or eviction can be about more than elapsed time.
Inter-process communication. If I share a big cake between multiple Python VM, or if I have data I wanna be available in all jobs in my task queue, redis is the perfect candidate.
Basically anything I don't want to store in PostGres, SQlite or a JSON file will go in Redis first. I may think of an alternative after that, but as a first place to go, it's a decent guess almost all the time.
Not having to rely on the file system, being able to share data between workers, and spare your db from a lot of writes that don't need the whole DMBS overhead in the first place is a huge win.
And because it's cheap and easy to do, I'm more likely to do so.
A few more things
Despite being in RAM, you won't lose your data if redis is shut down, it replicates the data on disk on write and reload it back on start.
Despite having super nice default, redis is very configurable. The RAM it takes, the frequency of disk snapshots and password access are all available.
Redis is already very useful on the same machine as your code, but it is also a good way to share data between several machine. Expose the port to them, and let everybody access it.
There is a more complex product called "Redis Stack" that brings even more goodies to redis. And an optional cloud offering. Still FOSS and runs locally though. JSON support is nice.
Mind your RAM. It's easy to cram everything in redis, and memory is cheap nowadays, but it doesn't mean you should not be aware of the price you are paying.
There is a great UI, redisinsight, that lets you inspect your redis server content. Yep, it's FOSS, and yes it's free. But it works with the commercial cloud offering too. I must sound like an ad for redis at this point, but honestly they brought so much value to my professional life I might as well.
It’s common to namespace the keys using colons (:). If I’s want to store the list of tags for a videos of id 789, I’d name the key “video:789:tags”. It’s not mandatory, but a good practice.
Thank you for the article. Great as always.
I'm hesitating to pick a task queue for a project. Celery seems to be the default choice for the researchs I made, but Huey seems interesting. The only disavantage I see at the moment, is that Celery has an additional project flower to see how tasks are working, but it seems RedisInsight can be a solution for that issue 🤔
Hi Bitecode, hope you are fine. I have two new questions :)
1 - what do you think of the recent Redis license changes?
2 - Have you use Django Huey? I ask a seasoned Django developer (who contributes to the project) what he thinks about it, and there are some concerns about synchronizing multiple instances (machines), he thinks it was designed for a single node. Have you encountered similar issues? How did you solve them if it is the case?
I know, it is more than 2 questions XD