Add speaker notes

This commit is contained in:
Jake Howard 2024-06-01 22:05:28 +01:00
parent d5365e0282
commit 24b9e43712
Signed by: jake
GPG key ID: 57AFB45680EDD477

436
slides.md
View file

@ -29,6 +29,19 @@ themeConfig:
<img src="/dceu24-qrcode.png" width="110px" />
</div>
<!--
- Hi
- I'm Jake
- Senior Systems Engineer at Torchbox
- I'm also on the security team, and as of last week the core team for Wagtail
- Leading Django-based CMS
- I exist in many places on the internet
- Here to talk about Background Workers
- What they are
- How to use them
- Exciting things _hopefully_ coming to Django
-->
---
layout: center
---
@ -50,6 +63,13 @@ flowchart LR
}
</style>
<!--
- Django is a web framework
- It's a magic box which turns HTTP requests into HTTP responses
- What you do inside that box is up to you
- For something like a blog, that's probably as far as it needs to go
-->
---
layout: full
---
@ -79,12 +99,28 @@ flowchart TD
}
</style>
<!--
- For a full web application, you need a little more than that
- Not just "keep information in a database"
- Notification emails
- Talk to external services
- Transcoding video
- Complex reporting
- It's 2024, so lots of ML
- For many of these, you need code which runs outside the magic box
- You don't want your user waiting whilst these happen
- If you had to wait whilst YouTube transcoded all your videos, you'd get pretty annoyed
-->
---
layout: full
---
# Background Workers!
<v-click>
# Background Workers?
</v-click>
```mermaid
flowchart TD
@ -97,7 +133,7 @@ flowchart TD
R[Reporting]
ML((Machine<br>Learning))
B{{<strong>Background Processing</strong>}}
B{{<strong>Background Worker</strong>}}
U<-->D
@ -112,15 +148,24 @@ flowchart TD
}
</style>
<!--
- You need a background worker
- But[click]
- What are background workers
- Let you offload complexity outside of the request-response cycle
- To be run somewhere else, potentially at a later date
- They keep requests quick
- Move the slow bits somewhere else
- So the user doesn't have to wait
- This improves throughput and latency
-->
---
layout: section
---
# Background Workers?
---
layout: fact
---
## Background worker architecture
```mermaid
flowchart LR
@ -133,12 +178,29 @@ flowchart LR
D<----->S<-....->R1 & R2 & R3
```
<!--
- How does this work?
- Web process submits a function to be run
- Stored in the queue store
- A runner then grabs a task, runs it, and returns the result to the queue store
- You can retrieve its status later if needed
-->
---
layout: section
---
# When?
<!--
- Background workers are very useful tool
- But that doesn't mean they're useful for everything, all the time
- As with all great things: "It depends"
- The added complexity may not be worth it
- A few things to consider
-->
---
layout: cover
background: https://images.unsplash.com/photo-1518729371765-043e54eb5674?q=80&w=1807&auto=format&fit=crop&ixlib=rb-4.0.3
@ -146,6 +208,15 @@ background: https://images.unsplash.com/photo-1518729371765-043e54eb5674?q=80&w=
# Does it take time?{.text-right}
<!--
- Something which takes time
- Something which _could_ take time
- Rather than have the user wait
- Unable to close the tab or do something else
- Go off and do it in the background, and let them know whether it's done
- Even if that's by polling it in the browser
-->
---
layout: fact
---
@ -174,6 +245,15 @@ flowchart BT
D-.-E & EA & V & R & ML
```
<!--
- Leaving your infrastructure
- The core components (Server, DB, Cache etc) you control and can closely monitor
- And are in a good position to fix it if something goes wrong
- That's not true for external APIs
- It's someone else's SRE team
- Their performance regressions shouldn't affect your app
-->
---
layout: cover
background: https://images.unsplash.com/photo-1518770660439-4636190af475?q=80&w=3870&auto=format&fit=crop&ixlib=rb-4.0.3
@ -181,6 +261,15 @@ background: https://images.unsplash.com/photo-1518770660439-4636190af475?q=80&w=
# Specialized hardware?
<!--
- Maybe it's less about when, more about where?
- Maybe it's more about the hardware it runs on
- GPUs
- Loads of RAM
- External hardware
- Isolated network
-->
---
layout: cover
background: https://images.unsplash.com/photo-1711606815631-38d32cdaec3e?q=80&w=2070&auto=format&fit=crop&ixlib=rb-4.0.3
@ -189,6 +278,17 @@ background: https://images.unsplash.com/photo-1711606815631-38d32cdaec3e?q=80&w=
## Example:
# Complex reporting
<!--
- An example: Complex reporting
- Something analytical, crunching lots of data
- Initially, it might take a few seconds, which is fine
- You build a CSV as part of the request with something like `pandas`
- As your application grows, there'll be more data, so it'll likely take a lot longer
- Rather than force the user to wait, let them get the data when it's ready
- They can get back on with their day
- Web servers can get back to processing other requests
-->
---
layout: section
---
@ -197,30 +297,20 @@ layout: section
<logos-django class="[&>path]:fill-white! h-fit w-60 -mt-20"/>
---
layout: cover
background: /celery.svg
---
# Celery!
<style>
.slidev-layout {
background: white;
background-size: contain !important;
}
</style>
<!--
- Back to Django
- This is djangocon after all
- In Python and Django, there are lots of different frameworks to achieve this
-->
---
layout: image-right
image: https://images.unsplash.com/photo-1444703686981-a3abbc4d4fe3?q=80&w=1740&auto=format&fit=crop&ixlib=rb-4.0.3
---
# Others...
# Libraries
<Transform :scale="1.05">
- ~~Celery~~
- Celery<br><br>
- arq
- Django DB Queue
- Django Lightweight Queue
@ -233,7 +323,15 @@ image: https://images.unsplash.com/photo-1444703686981-a3abbc4d4fe3?q=80&w=1740&
- Taskiq
- ...
</Transform>
<!--
- All require an external library
- And possibly some external infrastructure
- Celery is probably the biggest one
- But it's not all that exists
- So many different libraries exist
- With different strengths / weaknesses
- Different learning curves (or cliffs)
-->
---
layout: cover
@ -243,6 +341,14 @@ background: https://images.unsplash.com/photo-1522096823084-2d1aa8411c13?q=80&w=
## Example:
# Email <mdi-email-fast-outline />
<!--
- Let's loon at an example, sending an email
- Very common functionality
- Let's imagine a CMS
- For totally unbias reasons
- When a page is published, send an email to everyone subscribed
-->
---
layout: center
---
@ -266,6 +372,26 @@ for user in page.subscribers.iterator():
)
```
<!--
- Here's the code we might write to do that
1. [click]Find the users to email
2. [click]Construct the email content
3. [click]Send the email
- [click]This works perfectly fine
- Scales _relatively_ well
- But has some issues
- If connecting to the email server takes a while, the user has to wait
- Usually only a few ms
- Might take a few seconds
- Subsequent emails are delayed whilst we process the earlier ones
- If something goes wrong with one email, the others won't send
- What if your email gateway is down altogether - do your requests start erroring?
- How do you handle it if they do?
- That web worker (eg gunicorn) can't process any other requests until this is done
-->
---
layout: center
---
@ -292,6 +418,30 @@ for user in page.subscribers.iterator():
django_rq.enqueue(send_email_to_user, user)
```
<!--
- Let's look at an example of how we might use background workers to help with this
- Use Django-RQ for this
1. [click]Find the users to email
2. [click]New: Start a task for each user
3. [click]Construct the email content
4. [click]Send the email
- [click]Most of this is exactly the same
- If you knew nothing of RQ, you could still maintain this code
- [click]Moving it to the background just quickly puts an item in the queue
- And then the user can get back on with their life
- Emails get sent out by the runners
- Multiple runners means they get sent out faster
- [click]Email sending is an easy action to move to the background
- It's a connection to an external API
- Variable latency
- Infrastructure you don't control
- All of that is simpler to handle when it's already running in the background
-->
---
layout: center
---
@ -351,18 +501,65 @@ for user in page.subscribers.iterator():
}
</style>
<!--
- There's something I just said which might end up causing issues
- You'll notice I said "Using RQ" in that example
- That's because each worker library has its own API
- Its own features
- Its own configuration
- Its own caveats / implementation details
- What if we wanted to switch to Celery?
- [click]Well, that's easy
- [click]Just change a few lines
- [click]But there in lies the problem
- You had to make some changes!
- Sure, they're small, but this is only a tiny amount of code
- What if you wanted to support both?
-->
---
layout: image
image: /situation.png
backgroundSize: 50%
---
<!--
- It's hard enough having multiple options
- But how do you choose between them?
- If you've used multiple tools, you probably know which is best for you
- Do you have the time (and patience) to test each one out?
- Maybe you already have a standard you need to work to
- Do you need a background worker which supports `asyncio`?
- If you're new to Django, do you really want to spend the time weighing them all up?
- Knowing it could bite you as you grow or need a specific feature
- Requiring a lot of time refactoring in future
- What about library maintainers
- What if you built a library which has some background task needs
- Like, say, Wagtail
- Do you write and maintain integrations for _all_ task libraries
- Do you choose the big one(s) and force your users' hands?
- Do you expose a hook and let your users integrate themselves?
- It adds a huge maintenance burden, whichever you choose
- There isn't really a right answer
-->
---
layout: image
image: /ridiculous.png
backgroundSize: 49%
---
<!--
- There _should_ be one universal standard which combines them all
- A single API to help developers use a library, without tieing their hands
- First-party, allowing library developers to depend on it instead of supporting every separate API
- Scale easily as your needs change
- Be easy to get started with for small projects
- But feature-packed for larger deployments
- Allowing easy stubbing out during tests
- Tests are important!
-->
---
layout: fact
---
@ -375,6 +572,10 @@ layout: fact
<img src="/django-tasks-qrcode.png" width="110px" />
</div>
<!--
- In progress API spec for first-party background workers in Django
-->
---
layout: image-right
image: https://images.unsplash.com/photo-1674027444485-cec3da58eef4?q=80&w=1932&auto=format&fit=crop&ixlib=rb-4.0.3
@ -389,6 +590,19 @@ class: flex items-center text-xl
- "Dummy"
- Django 5.2 🤞
- Backport for 4.2+
<!--
- An API contract between worker library maintainers and application developers
- Compatibility layer between Django and their native APIs
- Hopefully the promise of "Write once, run anywhere"
- Built-in worker queues
- ORM based (production grade)
- "Immediate" (ie doesn't background anything) loaded by default
- Dummy (for testing)
- Hopefully landing in Django 5.2
- Backwards compatible with Django 4.2, to allow easy adoption
-->
---
layout: center
---
@ -419,7 +633,7 @@ for user in page.subscribers.iterator():
send_email_to_user.delay(user)
```
```python {all|7-9,20|all}
```python
from django.contrib.auth.models import User
from django.core.mail import send_mail
from django.template.loader import render_to_string
@ -449,6 +663,21 @@ for user in page.subscribers.iterator():
}
</style>
<!--
- Let's look at the same code example as before
- This is tied to Celery
- If want to support RQ too, I'd have to duplicate some parts
- Instead, let's rewrite this once to use `django.tasks`[click]
- Still simple, clear, approachable and easy to use
- If I say so myself
- Now, in future, if we swapped to RQ (for whatever reason), exactly 0 lines need to change
- If a new library comes out, 0 lines need to change
- If this is in a library, not my own code, I can keep using the library no matter what worker I'm using
- And the maintainer doesn't need to special case
- If I want to test this code, I can swap out the backend to an in-memory one, and interrogate it
- With 0 lines changed
-->
---
layout: center
---
@ -481,6 +710,16 @@ EMAIL_BACKEND = "django.core.mail.backends.tasks.SMTPEmailBackend"
</v-click>
<!--
- In this case, we can actually make it even easier
- Because email is such a common use case, and so easy to extract
- Go back to the simple implementation
- No background workers in sight
- [click]Use the built-in task email backend
- Emails are magically sent in the background automatically
- Without additional work
-->
---
layout: image-right
image: /soon.png
@ -489,6 +728,20 @@ class: flex justify-center text-2xl flex-col
# Q: Why something new?
<!--
- I'm sure you're thinking "Why something new?"
- Celery already has a borderline monopoly on task queues
- Writing a production-grade task queue is hard
- As I've been told whilst working on this DEP
- Why not just vendor something existing?
- If not Celery, then something else
- That's not really the goal
- Shared API contract is
- The built-in version will hopefully become great
- But must be done with careful planning and consideration
- Django needs to remain the stable and reliable base it always has been
-->
---
layout: image-right
image: https://images.unsplash.com/photo-1525683879097-8babce1c602a?q=80&w=1335&auto=format&fit=crop&ixlib=rb-4.0.3
@ -504,6 +757,32 @@ class: flex justify-center text-xl flex-col
- Use what's already there
- A common API
<!--
- Being built-in reduces the battier to entry
- Integrating becomes much simpler
- There's 1 API to learn, and it will last you a while
- Much like the ORM has done for different DB engines
- A developer can join a new project and already be productive
- A common API also helps library maintainers
- Maintaining a large library is work enough
- Without needing to think about how to move code to the background
- If Django can take complexity off you, that's great
- Currently, it's not really an option
- The burden is just too great
- With this, no additional dependencies
- Just import from Django and you're set
- The user can use what they want
- Or what's suitable for their scale and use case
- Now the barrier is reduced, the ecosystem can flourish
- Libraries can start assuming background workers, without any additional burden
- The ORM backend should work for the majority of projects
- If you just want to send emails in the background, you probably don't need Celery
- It's overkill
- Even RQ is a bit much
- A vendored solution makes it the easiest to get started with
- Tweak some settings, run an extra process, and you're done.
-->
---
layout: center
transition: fade
@ -524,6 +803,14 @@ transition: fade
}
</style>
<!--
- ORM at scale
- For some scales, an ORM-based worker might not be viable
- The Sentrys and Instagrams of the world
- Postgres scales pretty well, but sometimes not well enough
- And that's ok!
-->
---
layout: center
---
@ -543,6 +830,20 @@ layout: center
}
</style>
<!--
- But the same is also true for Postgres FTS vs ElasticSearch
- A debate that's been going on for a while
- And I've had many times
- ElasticSearch is quite likely better for the ~10% of people who need it
- But that doesn't mean the other 90% of people won't be happy with PostgreSQL
- And probably wouldn't benefit from ElasticSearch anyway
- And definitely won't get a return on the extra hosting cost and complexity
- They'll be perfectly happy with Postgres FTS
- Let them get started the easiest way possible
- We can still invite them into ElasticSearch when they're ready
-->
---
layout: section
---
@ -555,26 +856,46 @@ image: /dep.png
---
---
layout: cover
layout: section
---
# `pip install django-tasks`
<div class="absolute right-1/2 translate-x-1/2 mt-12">
<QRCode
:width="120"
:height="120"
data="https://pypi.org/project/django-tasks/"
:dotsOptions="{ color: 'white' }"
/>
<div class="absolute right-1/2 translate-x-1/2 mt-6">
<img src="/django-tasks-qrcode.png" width="150px" />
</div>
<!--
- You can play with this right now!
- Download it, play around with it
- The dummy backend is great for testing
- The immediate backend can help get you started
- The ORM backend is where the magic happens
- Tell me about all the bugs in my code
- The more testing we can do now, the better
- There are still features to implement and improve
-->
---
layout: section
---
# Where will we be _soon_™?
<!--
- More testing
- Upstreaming
- The main benefit is in this becoming part of Django
- The DEP is approved (ish)
- Once `django-tasks` is in a better state, it can become `django.tasks`
- Hopefully in time for the 5.2 release window
- Adoption
- The more people know about this, the better it is for everyone
- Developers can start using `django-tasks` now, and swap for `django.tasks` later
- Or use both side-by-side in older versions of Django / older libraries
- The 2 will work correctly together inside the same project
-->
---
layout: cover
background: /celery.svg
@ -589,6 +910,16 @@ background: /celery.svg
}
</style>
<!--
- Is this the end for Celery and alike?
- Not at all!
- If you're using Celery, you've not made a mistake
- It's a great choice
- They have quite a head start
- This is much more about usability and flexibility
- If you need certain features, keep using them!
- But, now you have the option of a nice, Django-native API
-->
---
layout: image-right
@ -608,6 +939,15 @@ class: flex justify-center flex-col text-xl
- Swappable argument serialization
- ...
<!--
- The world of background workers is huge
- Not everything is making it into the initial version(s)
- And that's ok!
- Existing libraries have a head start
- But I hope we can slowly catch them up
- Bringing the stability and longevity guarantees that come with Django
-->
---
layout: cover
background: https://images.unsplash.com/photo-1519187903022-c0055ec4036a?q=80&w=1335&auto=format&fit=crop&ixlib=rb-4.0.3
@ -615,6 +955,19 @@ background: https://images.unsplash.com/photo-1519187903022-c0055ec4036a?q=80&w=
# The future is bright
<!--
- The future is bright though
- In time, I see more and more people reaching to `django.tasks`
- Moving work to the background will make Django apps _seem_ faster
- Improve throughput
- Reduce latency
- Improve reliability
- Gone are the days of needing additional research and testing to find the tooling you need
- You can use the ones built-in to Django
- And as you scale, if you find you need to swap something out, you can
- _without_ rewriting half your application
-->
---
layout: section
---
@ -630,6 +983,21 @@ layout: section
/>
</div>
<!--
- Time to turn the dream into a reality!
- If you've realised you could use a background queue, give `django_tasks` a try
- Test it out
- Report back your issues
- Suggest improvements
- If you want to get involved, please do!
- There's plenty of work to do
- And I can't do it alone!
- If you maintain a worker library
- Or have been burned by one
- let's chat!
-->
---
layout: end
---