From 9afc4b07456211ba7b5bcee9f5e1848043f3a74d Mon Sep 17 00:00:00 2001 From: Jake Howard Date: Wed, 7 Feb 2024 09:07:08 +0000 Subject: [PATCH] Adapt proposal to not be Wagtail specific --- README.md | 100 ++++++++++++++++++++++++------------------------------ 1 file changed, 45 insertions(+), 55 deletions(-) diff --git a/README.md b/README.md index 665e91c..065e783 100644 --- a/README.md +++ b/README.md @@ -7,17 +7,15 @@ For discussions, check out the [associated PR](https://github.com/RealOrangeOne/ ## Abstract -Wagtail currently doesn't have a first-party solution for long-running tasks. Other CMSs in the ecosystem such as WordPress and Drupal have background workers, allowing them to push tasks into the background to be processed at a later date, without requiring the end user to wait for them to occur. +Django currently doesn't have a first-party solution for long-running tasks. Other frameworks such as Laravel have background workers, allowing them to push tasks into the background to be processed at a later date, without requiring the end user to wait for them to occur. -One of the key goals behind this proposal is removing the requirement for the user to wait for tasks they don't need to. +One of the key goals behind this proposal is removing the requirement for the user to wait for tasks they don't need to, moving computation and complexity out of the request-response cycle, towards dedicated background worker processes. -This proposed implementation specifically doesn't assume anything about the user's setup. This not only reduces the chances of Wagtail conflicting with any existing task system implemented by applications, but also allows it to work with almost any hosting environment a user might be using. +This proposed implementation specifically doesn't assume anything about the user's setup. This not only reduces the chances of Django conflicting with existing task systems a user may be using (eg Celery, RQ), but also allows it to work with almost any hosting environment a user might be using. ## Background -Some tasks done as part of certain Wagtail requests don't need to block the user, and could instead be pushed to the background, improving the perceived responsiveness of the application. Having a first-party solution would also remove the need for downstream users to build a background worker pipeline themselves. - -A prime example of this kind of improvement is re-indexing pages. Currently, when a user publishes a page, the "Publish" action also re-indexes the page, which slows down the request unnecessarily. The user doesn't need to wait for the indexes to be updated, meaning they could continue with whatever they need to do next faster. By moving tasks into the background, it also means longer tasks don't tie up the application server, meaning it should be able to handle more editor traffic. +A prime example of this kind of improvement is re-indexing pages in Wagtail (where this RFC began). Currently, when a user publishes a page, the "Publish" action also re-indexes the page, so changes are reflected in the search index, which slows down the request unnecessarily. The user doesn't need to wait for the indexes to be updated, meaning they could continue with whatever they need to do next faster. By moving tasks into the background, it also means longer tasks don't tie up the application server, meaning it should be able to handle more traffic. Other CMSs such as WordPress and Drupal have background workers to accelerate these kinds of non-blocking tasks. These APIs allow both for the tools themselves to push tasks to the background, but also for users to submit tasks themselves. @@ -25,34 +23,34 @@ Other CMSs such as WordPress and Drupal have background workers to accelerate th This feature has some basic requirements for it to be considered "complete": -- Wagtail's background tasks should be opt-in, and Wagtail should function as it does now without it. +- Django's background tasks should be opt-in, and Django should function as it does now without it - Users should be able to choose from either running a persistent background process, or periodic execution with cron -- Users should have multiple options for task backends, depending on their scale and hosting environment. By default, Redis and Django's ORM should be supported. -- Users should be able to easily add their own tasks to be executed, whether through Wagtail hooks or entirely manually. +- Users should have multiple options for task backends, depending on their scale and hosting environment. +- Users should be able to easily add their own tasks to be executed. - Tasks should be able to specify a priority, so they can be executed sooner, regardless of when they were submitted. -- Users should need to neither know nor care about the specific implementation details. This includes both the implementation details, and which backend is being used (mostly applicable to library authors) +- Users should need to neither know nor care about the specific implementation details. This includes which backend is being used (mostly applicable to library authors) ## Implementation -The proposed implementation will be in the form of an application wide "Job backend". This backend will be what connects Wagtail to the task runners. The job backend will provide an interface for either 3rd party libraries, or application developers to specify how jobs should be created and pushed into the background +The proposed implementation will be in the form of an application wide "Job backend". This backend will be what connects Django to the task runners. The job backend will provide an interface for either third-party libraries, or application developers to specify how jobs should be created and pushed into the background -The default backend will not push jobs into the background, instead running them in-band. This means the change is backwards compatible with current wagtail. +The default backend will not push jobs into the background, instead running them in-band. This means the change is backwards compatible with current Django, and can be adopted slowly. -Wagtail will also ship with an ORM-powered backend for users to transition too easy should they want the powers of a background worker without a large infrastructure migration or commitment. Whilst this backend will be designed to be performant enough, and considered fine for production use, it won't be designed to fully scale to all needs, and so tools such as Redis still have a place. +Django will also ship with an ORM-powered backend for users to transition too easy should they want the powers of a background worker without a large infrastructure migration or commitment. Whilst this backend will be designed to be performant enough, and considered fine for production use, it won't be designed to fully scale to all needs, and so tools such as Redis still have a place. -Whilst this proposal also covers scheduled tasks for Wagtail, enabling those should be opted in separately to task scheduling. +Whilst this proposal also covers scheduled tasks for Django, enabling those should be opted in separately to task scheduling. ## Proposed API ### Backends -A backend will be a class which extends a wagtail-defined base class. +A backend will be a class which extends a Django-defined base class. ```python from datetime import datetime from typing import Callable, Dict, List -from wagtail.contrib.tasks import BaseJobBackend +from django.contrib.tasks import BaseJobBackend class MyBackend(BaseJobBackend): def __init__(self, options: Dict): @@ -61,25 +59,25 @@ class MyBackend(BaseJobBackend): """ super().__init__(options) - def enqueue(self, func: Callable, args: List, kwargs: Dict) -> None: + def enqueue(self, func: Callable, priority: int | None, args: List, kwargs: Dict) -> None: """ Queue up a job function to be executed """ ... - def defer(self, func: Callable, when: datetime, args: List, kwargs: Dict) -> None: + def defer(self, func: Callable, priority: int | None, when: datetime, args: List, kwargs: Dict) -> None: """ Add a job to be completed at a specific time """ ... - async def aenqueue(self, func: Callable, args: List, kwargs: Dict) -> None: + async def aenqueue(self, func: Callable, priority: int | None, args: List, kwargs: Dict) -> None: """ Queue up a job function (or coroutine) to be executed """ ... - async def adefer(self, func: Callable, when: datetime, args: List, kwargs: Dict) -> None: + async def adefer(self, func: Callable, priority: int | None, when: datetime, args: List, kwargs: Dict) -> None: """ Add a job function (or coroutine) to be completed at a specific time """ @@ -93,7 +91,7 @@ If a backend doesn't support a particular scheduling mode, it simply does not de Similarly to Django's caching framework, a global "background" object can be imported, which is used to add tasks. ```python -from wagtail.contrib.tasks import background +from django.contrib.tasks import background def do_a_task(*args, **kwargs): pass @@ -116,21 +114,30 @@ background.defer(do_a_task, args=[], kwargs={}, when=datetime.datetime.now() + d When scheduling a task, it may not be exactly that time a task is executed, however it should be accurate to within a few seconds. -#### Background Hooks +#### Background signals -For Wagtail [hooks](https://docs.wagtail.io/en/stable/reference/hooks.html), there will be an additional property passed when registering the hook, which will transparently convert the hook to a task, and ensure it's submitted as a task when the hook should be called. Only certain hooks will support background tasks. Others, such as those for registering URLs or menu items must be run synchronously, and so will raise an error. This will only be applicable for certain hooks, and will do nothing when passed to these hooks. +For Django [signals](https://docs.djangoproject.com/en/stabke/ref/signals/), there will be an additional property passed when registering the signal, which will transparently convert the signal to a task, and ensure it's submitted as a task when the signal should be called. Only certain hooks will support background tasks. Others, such as those for requests or migrations, must be run synchronously, and so will raise an error. ```python -from wagtail.core import hooks +from django.core.signals import request_started +from django.db.models.signals import pre_save +from django.dispatch import receiver -@hooks.register('name_of_hook', background=True) -def my_hook_function(arg1, arg2...) - pass + +@receiver(pre_save, sender=MyModel, background=True) +def my_callback(sender, **kwargs): + print("Your model is about to be saved!") + + +# Raises an error when defined +@receiver(request_started, background=True) +def on_request(sender, **kwargs): + print("This will never be registered") ``` -It will still be possible to manually run `background.enqueue` etc inside a hook method, this is merely a convenience wrapper. +It will still be possible to manually run `background.enqueue` etc inside a signal method, this is merely a convenience wrapper. -Whilst it's possible to define tasks anywhere in an application, the convention will be to put them alongside hooks in the `wagtail_hooks.py` file, to ensure they're imported at the right times. +Whilst it's possible to define tasks anywhere in an application, the convention will be to put them in a `tasks.py` file, and manually import them during `AppConfig.ready`. #### Async tasks @@ -140,46 +147,29 @@ Where the underlying implementation supports it, backends may also provide an `a await background.aenqueue(do_a_task, args=[], kwargs={}) ``` +Here, `do_a_task` can either be a regular function or coroutine. + ### Settings Values shown below are the default. ```python -WAGTAIL_JOBS = { - "BACKEND": "wagtail.contrib.tasks.backends.ImmediateBackend", - "OPTIONS": {} # To pass to the backend's constructor +TASKS = { + "BACKEND": "django.contrib.tasks.backends.ImmediateBackend", + "OPTIONS": {} # Passed to the backend's constructor } ``` -Note that only a single backend may be defined for an application. Should multiple task runners be required, this should be implemented in a custom backend. +Note that only a single backend may be defined for an application. Future iterations may extend this to support multiple backends. ### Checks To ensure the configuration is valid, a check will be added to validate: -- If `SCHEDULE` is set, does the backend support scheduling? +- If `TASKS` is set, does the backend exist? ## Current workarounds -For Wagtail's internals, it's currently not possible to control how these are executed. For user-controlled code, it's possible to implement a task queueing system separate from Wagtail, and manually submit tasks to it as needed. +Currently, a user is required to integrate their own background worker functionality, using external tools such as Celery or RQ. -## Implementation plan - -1. Create the basic plumbing and base classes required -2. Implement (`async`-compatible) `ImmediateBackend` -3. Enable creating wagtail hooks as tasks -4. Implement an ORM backend -6. Begin migrating background-compatible bits of Wagtail to tasks -7. Documentation -8. Initial release? -9. Complete migrating background-compatible bits of Wagtail to tasks - -## Open Questions - -- How will / should this interact with the ongoing "Bulk Actions" work? -- Is this _contrib_? -- The background runner should probably have a name of some sort (and be consistent around terminology eg tasks) -- Should Wagtail contain some additional backends for commonly used task runners? - - Or, should these be separate packages under the Wagtail organisation? -- What should the argument for scheduling a task be? Should Wagtail parse it into a format more consumable for the backend? -- Should the ORM backend use a custom task runner, or a 3rd-party package? +Library maintainers currently must resort to introspecting their runtime, or defining hooks to allow integrators to defer certain code paths to the background