bringing-background-workers.../slides.md

1030 lines
25 KiB
Markdown

---
title: Empowering Django with Background Workers
class: text-center
highlighter: shiki
transition: slide-left
mdc: true
monaco: false
themeConfig:
primary: '#0c4b33'
---
# Empowering <logos-django class="[&>path]:fill-white! h-15 w-43"/> with Background Workers
## Jake Howard{.mt-8}
<ul class="list-none! [&>li]:m-0!">
<li>Senior Systems Engineer @ Torchbox <mdi-fire class="fill-white"/></li>
<li>Core, Security & Performance teams @ Wagtail <logos-wagtail class="fill-white"/></li>
</ul>
<ul class="list-none! text-sm [&>li]:m-0! mt-5">
<li><mdi-earth /> theorangeone.net</li>
<li><mdi-github /> @RealOrangeOne</li>
<li><mdi-twitter /> @RealOrangeOne</li>
<li><mdi-mastodon /> @jake@theorangeone.net</li>
</ul>
<div class="absolute right-3 bottom-3">
<img src="/dceu24-qrcode.png" width="140px" />
</div>
<!--
- Hi
- I'm Jake
- Senior Systems Engineer at Torchbox
- I'm also on the security team, and as of last week the core team for Wagtail
- Leading Django-based CMS
- I exist in many places on the internet
- Here to talk about Background Workers
- What they are
- How to use them
- Exciting things _hopefully_ coming to Django
-->
---
layout: center
---
# Django is a web framework
```mermaid
flowchart LR
U(User 🧑‍💻)
D[\Django/]
U---->|Request|D
D---->|Response|U
```
<style>
.mermaid {
text-align: center;
}
</style>
<!--
- Django is a web framework
- It's a magic box which turns HTTP requests into HTTP responses
- What you do inside that box is up to you
- For something like a blog, that's probably as far as it needs to go
-->
---
layout: full
---
# Django isn't _just_ for websites
```mermaid
flowchart TD
U[User 🧑‍💻]
D[\Django/]
DB[(Database)]
E>Email]
EA[External API]
V[[Video Transcoding]]
R[Reporting]
ML((Machine<br>Learning))
U<--->D
D---DB
D-..-E & EA & V & R & ML
```
<style>
.mermaid {
text-align: center;
}
</style>
<!--
- For a full web application, you need a little more than that
- Not just "keep information in a database"
- Notification emails
- Talk to external services
- Transcoding video
- Complex reporting
- It's 2024, so lots of ML
- For many of these, you need code which runs outside the magic box
- You don't want your user waiting whilst these happen
- If you had to wait whilst YouTube transcoded all your videos, you'd get pretty annoyed
-->
---
layout: full
---
<v-click>
# Background Workers?
</v-click>
```mermaid
flowchart TD
U[User 🧑‍💻]
D[\Django/]
E>Email]
EA[External API]
V[[Video Transcoding]]
R[Reporting]
ML((Machine<br>Learning))
B{{<strong>Background Worker</strong>}}
U<-->D
D-..-B
B---E & EA & V & R & ML
```
<style>
.mermaid {
text-align: center;
}
</style>
<!--
- You need a background worker
- But[click]
- What are background workers
- Let you offload complexity outside of the request-response cycle
- To be run somewhere else, potentially at a later date
- They keep requests quick
- Move the slow bits somewhere else
- User doesn't have to wait
- Improves throughput and latency
-->
---
layout: section
---
## Background worker architecture
```mermaid
flowchart LR
D[\Django/]
S[(Queue Store)]
R1{Runner}
R2{Runner}
R3{Runner}
D<----->S<-....->R1 & R2 & R3
```
<!--
- How does this work?
- Web process submits a function to be run
- Stored in the queue store
- A runner then grabs a task, runs it, and returns the result to the queue store
- You can retrieve its status later if needed
-->
---
layout: section
---
# When?
<!--
- Background workers are very useful tool
- But that doesn't mean they're useful for everything, all the time
- As with all great things: "It depends"
- Trade-off between complexity and functionality
- A few things to consider
-->
---
layout: cover
background: https://images.unsplash.com/photo-1518729371765-043e54eb5674?q=80&w=1807&auto=format&fit=crop&ixlib=rb-4.0.3
---
# Does it take time?{.text-right}
<!--
- Does it take time
- _Could_ it take time
- Don't want to make the user wait
- Unable to close the tab or do something else
- Go off and do it in the background, and let them know whether it's done
- Even if that's by polling it in the browser
-->
---
layout: fact
---
## Does it leave your infrastructure?{.mb-5}
```mermaid
flowchart BT
D[\Django/]
subgraph Slow / Unreliable
E>Email]
EA[External API]
V[[Video Transcode]]
R[Reporting]
ML((Machine<br>Learning))
end
subgraph Fast & Reliable
DB[(Database)]
C[(Cache)]
end
D---DB & C
D-.-E & EA & V & R & ML
```
<!--
- Leaving your infrastructure
- The core components (Server, DB, Cache etc) you control and can closely monitor
- And are in a good position to fix it if something goes wrong
- That's not true for external APIs
- It's someone else's SRE team
- Their performance characteristics shouldn't affect your app
-->
---
layout: cover
background: https://images.unsplash.com/photo-1518770660439-4636190af475?q=80&w=3870&auto=format&fit=crop&ixlib=rb-4.0.3
---
# Specialized hardware?
<!--
- Maybe it's less about when, more about where?
- Maybe it's more about the hardware it runs on
- GPUs
- Loads of RAM
- External hardware
- Isolated network
-->
---
layout: cover
background: https://images.unsplash.com/photo-1711606815631-38d32cdaec3e?q=80&w=2070&auto=format&fit=crop&ixlib=rb-4.0.3
---
## Example:
# Complex reporting
<!--
- An example: Complex reporting
- Something analytical, crunching lots of data
- It might be fast locally
- As your application grows, there'll be more data, so it'll likely take a lot longer
- Rather than force the user to wait, let them get the data when it's ready
- They can get back on with their day
- Web servers can get back to processing other requests
-->
---
layout: section
---
# Background Workers in
<logos-django class="[&>path]:fill-white! h-fit w-60 -mt-20"/>
<!--
- Back to Django
- This is djangocon after all
- In Python and Django, there are lots of different frameworks to achieve this
-->
---
layout: image-right
image: https://images.unsplash.com/photo-1444703686981-a3abbc4d4fe3?q=80&w=1740&auto=format&fit=crop&ixlib=rb-4.0.3
---
# Libraries
- Celery<br><br>
- arq
- Django DB Queue
- Django Lightweight Queue
- Django Too Simple Q
- Django-Q
- Django-Q2
- Dramatiq
- Huey
- RQ
- Taskiq
- ...
<!--
- All require an external library
- And possibly some external infrastructure
- Celery is probably the biggest one
- But it's not all that exists
- So many different libraries exist
- With different strengths / weaknesses
- Different learning curves (or cliffs)
-->
---
layout: cover
background: https://images.unsplash.com/photo-1522096823084-2d1aa8411c13?q=80&w=1740&auto=format&fit=crop&ixlib=rb-4.0.3
---
## Example:
# Email <mdi-email-fast-outline />
<!--
- Let's loon at an example, sending an email
- Very common functionality
- Let's imagine a CMS
- For totally unbias reasons
- When a page is published, send an email to everyone subscribed
-->
---
layout: center
---
# Sending an email
```python {all|7|8|9-14|all}
from django.contrib.auth.models import User
from django.core.mail import send_mail
from django.template.loader import render_to_string
from wagtail.models import Page
for user in page.subscribers.iterator():
email_content = render_to_string("notification-email.html", {"user": user, "page": page})
send_mail(
subject=f"A change to {page.title} has been published",
message=email_content
from_email=None, # Use the default sender email
recipient_list=[user.email]
)
```
<!--
- Here's the code we might write to do that
1. [click]Find the users to email
2. [click]Construct the email content
3. [click]Send the email
- [click]This works perfectly fine
- Scales _relatively_ well
- But has some issues
- If connecting to the email server takes a while, the user has to wait
- Usually only a few ms
- Might take a few seconds
- If something goes wrong with one email, the others won't send
- What if your email gateway is down altogether - do your requests start erroring?
- How do you handle it if they do?
- That web worker (eg gunicorn) can't process any other requests until this is done
-->
---
layout: center
---
```python {all|18|19|10|11-16|all|18-19|all}
from django.contrib.auth.models import User
from django.core.mail import send_mail
from django.template.loader import render_to_string
import django_rq
from wagtail.models import Page
def send_email_to_user(page: Page, user: User):
email_content = render_to_string("notification-email.html", {"user": user, "page": page})
send_mail(
subject=f"A change to {page.title} has been published",
message=email_content
from_email=None, # Use the default sender email
recipient_list=[user.email]
)
for user in page.subscribers.iterator():
django_rq.enqueue(send_email_to_user, user)
```
<!--
- Let's look at an example of how we might use background workers to help with this
- Use Django-RQ for this
1. [click]Find the users to email
2. [click]New: Start a task for each user
3. [click]Construct the email content
4. [click]Send the email
- [click]Most of this is exactly the same
- If you knew nothing of RQ, you could still maintain this code
- [click]Moving it to the background just quickly puts an item in the queue
- And then the user can get back on with their life
- Emails get sent out by the runners
- Multiple runners means they get sent out faster
- [click]Email sending is an easy action to move to the background
- It's a connection to an external API
- Variable latency
- Infrastructure you don't control
- All of that is simpler to handle when it's already running in the background
-->
---
layout: center
---
# Using <span v-click.hide="1">RQ</span><span v-click="1"><s class="opacity-60">RQ</s> Celery</span>
````md magic-move
```python
from django.contrib.auth.models import User
from django.core.mail import send_mail
from django.template.loader import render_to_string
import django_rq
from wagtail.models import Page
def send_email_to_user(page: Page, user: User):
email_content = render_to_string("notification-email.html", {"user": user, "page": page})
send_mail(
subject=f"A change to {page.title} has been published",
message=email_content
from_email=None, # Use the default sender email
recipient_list=[user.email]
)
for user in page.subscribers.iterator():
django_rq.enqueue(send_email_to_user, user)
```
```python {all|7-9,20|all}
from django.contrib.auth.models import User
from django.core.mail import send_mail
from django.template.loader import render_to_string
from wagtail.models import Page
from my_celery_config import app
@app.task
def send_email_to_user(page: Page, user: User):
email_content = render_to_string("notification-email.html", {"user": user, "page": page})
send_mail(
subject=f"A change to {page.title} has been published",
message=email_content
from_email=None, # Use the default sender email
recipient_list=[user.email]
)
for user in page.subscribers.iterator():
send_email_to_user.delay(user)
```
````
<style>
.slidev-vclick-hidden {
display: none;
}
</style>
<!--
- There's something I just said which might end up causing issues
- You'll notice I said "Using RQ" in that example
- That's because each worker library has its own API
- Its own features
- Its own configuration
- Its own caveats / implementation details
- What if we wanted to use Celery instead?
- [click]Well, that's easy
- [click]Just change a few lines
- [click]But there in lies the problem
- You had to make some changes!
- Sure, they're small, but this is only a tiny amount of code
- What if you wanted to support both?
-->
---
layout: image
image: /situation.png
backgroundSize: 50%
---
<!--
- It's hard enough having multiple options
- But how do you choose between them?
- Maybe you have experience with libraries already
- Do you have the time (and patience) to test each one out?
- Maybe you already have a standard you need to work to
- Maybe you need specific features
- If you're new to Django, do you really want to spend the time weighing them all up?
- Knowing it could bite you as you grow or need a specific feature
- Requiring a lot of time refactoring in future
- What about library maintainers
- Like, say, Wagtail
- Do you write and maintain integrations for _all_ task libraries
- Do you choose the big one(s) and force your users' hands?
- Do you expose a hook and let your users integrate themselves?
- It adds a huge maintenance burden, whichever you choose
- There isn't really a right answer
-->
---
layout: image
image: /ridiculous.png
backgroundSize: 49%
---
<!--
- There _should_ be one universal standard which combines them all
- A single API to help developers use a library
- Without tieing their hands
- First-party
- Allowing library developers to depend on it instead of supporting every separate API
- Scale easily as your needs change
- Be easy to get started with for small projects
- But feature-packed for larger deployments
- Allowing easy stubbing out during tests
- Tests are important!
-->
---
layout: fact
---
## Introducing*:{.mb-5}
# `django.tasks`
<div class="absolute right-1/2 translate-x-1/2 mt-4">
<img src="/django-tasks-qrcode.png" width="140px" />
</div>
<!--
- In progress API spec for first-party background workers in Django
-->
---
layout: image-right
image: https://images.unsplash.com/photo-1674027444485-cec3da58eef4?q=80&w=1932&auto=format&fit=crop&ixlib=rb-4.0.3
class: flex items-center text-xl
---
- API contract between library and application developers
- Swappable backends through `settings.py`
- Built in backends:
- ORM
- "Immediate"
- "Dummy"
- Django 5.2 🤞
- Backport for 4.2+
<!--
- An API contract between worker library maintainers and application developers
- Compatibility layer between Django and their native APIs
- Hopefully the promise of "Write once, run anywhere"
- Built-in worker queues
- ORM based (production grade)
- "Immediate" (ie doesn't background anything) loaded by default
- Dummy (for testing)
- Hopefully landing in Django 5.2
- Backwards compatible with Django 4.2, to allow easy adoption
-->
---
layout: center
---
# <span v-click.hide="1">Using Celery</span><span v-click="1">Using <code>django.tasks</code></span>
````md magic-move
```python
from django.contrib.auth.models import User
from django.core.mail import send_mail
from django.template.loader import render_to_string
from wagtail.models import Page
from my_celery_config import app
@app.task
def send_email_to_user(page: Page, user: User):
email_content = render_to_string("notification-email.html", {"user": user, "page": page})
send_mail(
subject=f"A change to {page.title} has been published",
message=email_content
from_email=None, # Use the default sender email
recipient_list=[user.email]
)
for user in page.subscribers.iterator():
send_email_to_user.delay(user)
```
```python
from django.contrib.auth.models import User
from django.core.mail import send_mail
from django.template.loader import render_to_string
from wagtail.models import Page
from django.tasks import task
@task()
def send_email_to_user(page: Page, user: User):
email_content = render_to_string("notification-email.html", {"user": user, "page": page})
send_mail(
subject=f"A change to {page.title} has been published",
message=email_content
from_email=None, # Use the default sender email
recipient_list=[user.email]
)
for user in page.subscribers.iterator():
send_email_to_user.enqueue(user)
```
````
<style>
.slidev-vclick-hidden {
display: none;
}
</style>
<!--
- Let's look at the same code example as before
- This is tied to Celery
- If want to support RQ too, I'd have to duplicate some parts
- Instead, let's rewrite this once to use `django.tasks`[click]
- Still simple, clear, approachable and easy to use
- If I say so myself
- If we swapped to RQ: 0 lines need to change
- If a new library comes out, 0 lines need to change
- If this is in a library, not my own code, I'm not constrained by their preferences
- And the maintainer doesn't have extra work to support my preferences
- For testing, I can use an in-memory backend
- With 0 lines changed
-->
---
layout: center
---
```python
from django.contrib.auth.models import User
from django.core.mail import send_mail
from django.template.loader import render_to_string
from wagtail.models import Page
for user in page.subscribers.iterator():
email_content = render_to_string("notification-email.html", {"user": user, "page": page})
send_mail(
subject=f"A change to {page.title} has been published",
message=email_content
from_email=None, # Use the default sender email
recipient_list=[user.email]
)
```
<br />
<v-click>
```python
# settings.py
EMAIL_BACKEND = "django.core.mail.backends.tasks.SMTPEmailBackend"
```
</v-click>
<!--
- In this case, we can actually make it even easier
- Because email is such a common use case, and so easy to extract
- Go back to the simple implementation
- No background workers in sight
- [click]Use the built-in task email backend
- Emails are magically sent in the background automatically
- Without additional work
-->
---
layout: image-right
image: /soon.png
class: flex justify-center text-2xl flex-col
---
# Q: Why something new?
<!--
- I'm sure you're thinking "Why something new?"
- Celery already has a borderline monopoly on task queues
- Writing a production-grade task queue is hard
- As I've been told whilst working on this DEP
- Why not just vendor something existing?
- If not Celery, then something else
- That's not really the goal
- Shared API contract is
- The built-in version will hopefully become great
- But must be done with careful planning and consideration
- Django needs to remain the stable and reliable base it always has been
-->
---
layout: image-right
image: https://images.unsplash.com/photo-1525683879097-8babce1c602a?q=80&w=1335&auto=format&fit=crop&ixlib=rb-4.0.3
class: flex justify-center text-xl flex-col
---
# Q: Why something built-in?
- Reduce barrier to entry
- Reduce cognitive load
- Reduce complexity for smaller projects
- Improve interoperability
- Use what's already there
- A common API
<!--
- Being built-in reduces the battier to entry
- Integrating becomes much simpler
- There's 1 API to learn
- It will last you a while
- Scale with your needs
- A developer can join a new project and already be productive
- A common API also helps library maintainers
- Maintaining a large library is work enough
- Without needing to think about how to move code to the background
- If Django can take complexity off you, great
- Currently, it's not really an option
- The burden is too great
- No additional dependencies for your library
- Just import from Django and you're set
- The user can use what they want
- Or what's suitable for their scale and use case
- Now the barrier is reduced, the ecosystem can flourish
- Libraries can assume background workers, without any additional burden
- The ORM backend should work for the majority of projects
- If you just want to send emails in the background, you probably don't need Celery or RQ
- It's overkill
- A vendored solution makes it the easiest to get started with
- Tweak some settings, run an extra process, and you're done.
-->
---
layout: center
transition: fade
---
![](/celery.svg){.h-32.mx-auto}
## vs
![](/postgres.png){.h-36.mx-auto}
<style>
.slidev-layout {
background: white;
color: black;
text-align: center;
}
</style>
<!--
- ORM at scale
- For some scales, an ORM-based worker might not be viable
- The Sentrys and Instagrams of the world
- Postgres scales pretty well, but sometimes not well enough
- And that's ok!
-->
---
layout: center
---
![](/elasticsearch.png){.h-32.mx-auto}
## vs
![](/postgres.png){.max-h-36.mx-auto}
<style>
.slidev-layout {
background: white;
color: black;
text-align: center;
}
</style>
<!--
- But the same is also true for Postgres FTS vs ElasticSearch
- A debate that's been going on for a while
- And I've had many times
- ElasticSearch is quite likely better for the ~10% of people who need it
- But that doesn't mean the other 90% of people won't be happy with PostgreSQL
- Probably wouldn't benefit from ElasticSearch anyway
- Definitely won't get a return on the extra hosting cost and complexity
- They'll be perfectly happy with Postgres FTS
- Let them get started the easiest way possible
- We can still invite them into ElasticSearch when they're ready
-->
---
layout: section
---
# Where are we now?
<!--
- I mean, other than Vigo
-->
---
layout: image
image: /dep.png
---
---
layout: section
---
# `pip install django-tasks`
<div class="absolute right-1/2 translate-x-1/2 mt-6">
<img src="/django-tasks-qrcode.png" width="140px" />
</div>
<!--
- You can play with this right now!
- Download it, play around with it
- The dummy backend is great for testing
- The immediate backend can help get you started
- The ORM backend is where the magic happens
- Tell me about all the bugs in my code
- The more testing we can do now, the better
- There's still work to do
-->
---
layout: section
---
# Where will we be _soon_™?
<!--
- More testing
- Upstreaming
- That's the big benefit
- Else it really is just another standard
- Once `django-tasks` is in a better state, it can become `django.tasks`
- Hopefully in time for the 5.2 release window
- Adoption
- The more people know about this, the better it is for everyone
- Developers can start working on integrating now
- Knowing they can trivially upgrade once it's in Django
-->
---
layout: cover
background: /celery.svg
---
# Is this the end?
<style>
.slidev-layout {
background: white;
background-size: contain !important;
}
</style>
<!--
- Is this the end for Celery and alike?
- Not at all!
- You've not made a mistake
- It's a great choice
- They have quite a head start
- This is much more about usability and flexibility
- If you need certain features, keep using them!
- Now you have the option of a Django-native API
- Which could even be Celery under the hood
-->
---
layout: image-right
image: https://images.unsplash.com/photo-1451187580459-43490279c0fa?q=80&w=1744&auto=format&fit=crop&ixlib=rb-4.0.3
class: flex justify-center flex-col text-xl
---
# Out of scope
- Completion / failed hooks
- Bulk queueing
- Automated task retrying
- Task runner API
- Unified observability
- Cron-based scheduling
- Task timeouts
- Swappable argument serialization
- ...
<!--
- The world of background workers is huge
- There are countless nice features
- Not everything is making it into the initial version(s)
- And that's ok!
- Existing libraries have a head start
- But I hope we can slowly catch them up
- Bringing the stability and longevity guarantees that come with Django
- Doesn't mean they'll never come
-->
---
layout: cover
background: https://images.unsplash.com/photo-1519187903022-c0055ec4036a?q=80&w=1335&auto=format&fit=crop&ixlib=rb-4.0.3
---
# The future is bright
<!--
- The future is bright though
- In time, I see more and more people reaching to `django.tasks`
- And background workers in general
- Moving work to the background will make Django apps _seem_ faster
- Improve throughput
- Reduce latency
- Improve reliability
- Gone are the days of needing additional research and testing to find the tooling you need
- You can use the ones built-in to Django
- And as you scale, it's easy to change
- _without_ rewriting half your application
- With all the knowledge to make an informed decision
-->
---
layout: section
---
# What's next?
<div class="absolute right-1/2 translate-x-1/2 mt-12">
<img src="/django-tasks-qrcode.png" width="140px" />
</div>
<!--
- Time to turn the dream into a reality!
- If you've realised you could use a background queue, give `django_tasks` a try
- Test it out
- Report back your issues
- Suggest improvements
- If you want to get involved, please do!
- There's plenty of work to do
- And I can't do it alone!
- If you maintain a worker library
- Or have been burned by one...
-->
---
layout: center
class: text-center text-xl
---
# Let's chat!
<ul class="list-none! [&>li]:m-0!">
<li><mdi-earth /> theorangeone.net</li>
<li><mdi-github /> @RealOrangeOne</li>
<li><mdi-twitter /> @RealOrangeOne</li>
<li><mdi-mastodon /> @jake@theorangeone.net</li>
</ul>
<div class="absolute right-1/2 translate-x-1/2 mt-3">
<img src="/dceu24-qrcode.png" width="140px" />
</div>
<style>
.slidev-layout {
background-color: #17181c;
color: #e85537;
}
</style>
---
layout: end
---
END