1
Fork 0
An intermediary attempt, sitting between the Wagtail RFC and the DEP https://github.com/django/deps/pull/86
This repository has been archived on 2024-02-09. You can view files and clone it, but cannot push or open issues or pull requests.
Find a file
2024-02-07 09:44:32 +00:00
LICENSE Initial commit 2024-02-07 08:34:35 +00:00
README.md Nod to Wagtail RFC 2024-02-07 09:44:32 +00:00

django-background-worker-rfc

  • Author: Jake Howard, with help from the Wagtail Performance sub-team
  • Created: 2024-02-07

For discussions, check out the associated PR and Discussions.

Abstract

This proposal started life as RFC 72 for Wagtail, but has been adapted to be generic to Django.

Django currently doesn't have a first-party solution for long-running tasks. Other frameworks such as Laravel have background workers, allowing them to push tasks into the background to be processed at a later date, without requiring the end user to wait for them to occur.

One of the key goals behind this proposal is removing the requirement for the user to wait for tasks they don't need to, moving computation and complexity out of the request-response cycle, towards dedicated background worker processes.

This proposed implementation specifically doesn't assume anything about the user's setup. This not only reduces the chances of Django conflicting with existing task systems a user may be using (eg Celery, RQ), but also allows it to work with almost any hosting environment a user might be using.

Background

A prime example of this kind of improvement is re-indexing pages in Wagtail (where this RFC began). Currently, when a user publishes a page, the "Publish" action also re-indexes the page, so changes are reflected in the search index, which slows down the request unnecessarily. The user doesn't need to wait for the indexes to be updated, meaning they could continue with whatever they need to do next faster. By moving tasks into the background, it also means longer tasks don't tie up the application server, meaning it should be able to handle more traffic.

Other CMSs such as WordPress and Drupal have background workers to accelerate these kinds of non-blocking tasks. These APIs allow both for the tools themselves to push tasks to the background, but also for users to submit tasks themselves.

Requirements

This feature has some basic requirements for it to be considered "complete":

  • Django's background tasks should be opt-in, and Django should function as it does now without it
  • Users should be able to choose from either running a persistent background process, or periodic execution with cron
  • Users should have multiple options for task backends, depending on their scale and hosting environment.
  • Users should be able to easily add their own tasks to be executed.
  • Tasks should be able to specify a priority, so they can be executed sooner, regardless of when they were submitted.
  • Users should need to neither know nor care about the specific implementation details. This includes which backend is being used (mostly applicable to library authors)

Implementation

The proposed implementation will be in the form of an application wide "Job backend". This backend will be what connects Django to the task runners. The job backend will provide an interface for either third-party libraries, or application developers to specify how jobs should be created and pushed into the background

The default backend will not push jobs into the background, instead running them in-band. This means the change is backwards compatible with current Django, and can be adopted slowly.

Django will also ship with an ORM-powered backend for users to transition too easy should they want the powers of a background worker without a large infrastructure migration or commitment. Whilst this backend will be designed to be performant enough, and considered fine for production use, it won't be designed to fully scale to all needs, and so tools such as Redis still have a place.

Whilst this proposal also covers scheduled tasks for Django, enabling those should be opted in separately to task scheduling.

Proposed API

Backends

A backend will be a class which extends a Django-defined base class.

from datetime import datetime
from typing import Callable, Dict, List

from django.contrib.tasks import BaseJobBackend

class MyBackend(BaseJobBackend):
    def __init__(self, options: Dict):
        """
        Any connections which need to be setup can be done here
        """
        super().__init__(options)

    def enqueue(self, func: Callable, priority: int | None, args: List, kwargs: Dict) -> None:
        """
        Queue up a job function to be executed
        """
        ...

    def defer(self, func: Callable, priority: int | None, when: datetime, args: List, kwargs: Dict) -> None:
        """
        Add a job to be completed at a specific time
        """
        ...

    async def aenqueue(self, func: Callable, priority: int | None, args: List, kwargs: Dict) -> None:
        """
        Queue up a job function (or coroutine) to be executed
        """
        ...

    async def adefer(self, func: Callable, priority: int | None, when: datetime, args: List, kwargs: Dict) -> None:
        """
        Add a job function (or coroutine) to be completed at a specific time
        """
        ...

If a backend doesn't support a particular scheduling mode, it simply does not define the method.

Running tasks

Similarly to Django's caching framework, a global "background" object can be imported, which is used to add tasks.

from django.contrib.tasks import background

def do_a_task(*args, **kwargs):
    pass

# Submit the task to be run
background.enqueue(do_a_task, args=[], kwargs={})

In the initial implementation, enqueue will not return anything. A response object which tracks in-flight tasks may be possible in a future iteration.

args and kwargs are intentionally their own dedicated arguments to make the API simpler and backwards-compatible should other attributes be added in future.

Deferring tasks

Tasks may also be "deferred" to run at a specific time:

background.defer(do_a_task, args=[], kwargs={}, when=datetime.datetime.now() + datetime.timedelta(minutes=5))

When scheduling a task, it may not be exactly that time a task is executed, however it should be accurate to within a few seconds.

Background signals

For Django signals, there will be an additional property passed when registering the signal, which will transparently convert the signal to a task, and ensure it's submitted as a task when the signal should be called. Only certain hooks will support background tasks. Others, such as those for requests or migrations, must be run synchronously, and so will raise an error.

from django.core.signals import request_started
from django.db.models.signals import pre_save
from django.dispatch import receiver


@receiver(pre_save, sender=MyModel, background=True)
def my_callback(sender, **kwargs):
    print("Your model is about to be saved!")


# Raises an error when defined
@receiver(request_started, background=True)
def on_request(sender, **kwargs):
    print("This will never be registered")

It will still be possible to manually run background.enqueue etc inside a signal method, this is merely a convenience wrapper.

Whilst it's possible to define tasks anywhere in an application, the convention will be to put them in a tasks.py file, and manually import them during AppConfig.ready.

Async tasks

Where the underlying implementation supports it, backends may also provide an async-compatible interface for scheduling tasks, using a-prefixed methods:

await background.aenqueue(do_a_task, args=[], kwargs={})

Here, do_a_task can either be a regular function or coroutine.

Settings

Values shown below are the default.

TASKS = {
    "BACKEND": "django.contrib.tasks.backends.ImmediateBackend",
    "OPTIONS": {}  # Passed to the backend's constructor
}

Note that only a single backend may be defined for an application. Future iterations may extend this to support multiple backends.

Checks

To ensure the configuration is valid, a check will be added to validate:

  • If TASKS is set, does the backend exist?

Current workarounds

Currently, a user is required to integrate their own background worker functionality, using external tools such as Celery or RQ.

Library maintainers currently must resort to introspecting their runtime, or defining hooks to allow integrators to defer certain code paths to the background