diff --git a/.spelling b/.spelling index c78eb7a..61dfe91 100644 --- a/.spelling +++ b/.spelling @@ -137,3 +137,11 @@ unliking Analytica InfoSec OOP-purist +2.x +LTS-based +regex +django +MD5-based +backend +queryset +lifecycle diff --git a/content/posts/django-22.md b/content/posts/django-22.md new file mode 100644 index 0000000..95f77ca --- /dev/null +++ b/content/posts/django-22.md @@ -0,0 +1,216 @@ +--- +title: Django 2.2 +date: 2019-04-01 +--- + +April marks the release of Django 2.2, the latest LTS version of the popular Python web framework. Django 2.2 marks almost 2 years of development since the last LTS release, 1.11 in April 2017, and brings with it some very large improvements and changes which naturally come with a major version increase. + +Django historically works off the LTS pattern of software releasing, providing 2 channels. The LTS versions are maintained for longer than the regular versions, and receive regular bug fixes and security patches in line with the main release channel. + +![Django update cycle](https://static.djangoproject.com/img/release-roadmap.e844db08610e.png) + +The bump between 1.11 and 2.2 also bought with it the updates from 2.0 and 2.1. Those features which have been being used by users for 18 months finally come to those who need the stability of an LTS release. I've not delved too far into the 2.x releases so far, as most of what I do strongly benefits from using an LTS-based version. + + +## Python 2 + +Ironically named, Django 2 is the first Django release to completely drop support for Python 2. Django 2.2 will require at least 3.5. Python 2 (commonly referred to as 'legacy python') will retire [in 2020](https://pythonclock.org/), so it's great to see Django dropping support well beforehand so users can start migrating their larger codebases. For years there's been a debate as to which major version of python is better: 2 or 3. Considering Python 2 now has an end-of-life date, and the performance gap is now a non-issue, hopefully this debate is over. + +## Simplified URL Routing + +In previous versions, Django's URL system relied heavily on regular expressions to match paths to views. This works fine for very simple data types (like integers, which can simply be `\d+`), but more complex data structures lead to more interesting URL patterns. In the past, I've had to resort to a simpler URL pattern, and then doing more URL validation in the view, which is less than ideal. UUIDs were famously very difficult to do, with many people resorting to `[0-9a-f-]+`, whereas the correct regex is in fact [`[0-9a-f]{12}4[0-9a-f]{3}[89ab][0-9a-f]{15}\Z`](https://stackoverflow.com/a/18359032). + +Thankfully, Django 2.0 fixed this, by drastically simplifying the URL routing syntax to allow for special keywords to be in place of RegEx capture groups. The new syntax is available with the new `path` function: + + +```python +path('articles//', views.year_archive), +``` + +This means UUID-based paths can now be written as: + +```python +path('articles//', views.year_archive), +``` + +This is a significant improvement over the previous methods. The intention is to deprecate the previous `url` function, and rename it `re_path`, but it's undefined when that will happen. There's support for the following shorthand types: + +- `str` +- `int` +- `slug` +- `uuid` +- `path` (any non-empty string) + +## Django Admin + + +### Mobile-friendly + +The django admin is now mobile friendly. This isn't a massive deal considering likely how few people are using it in production, but considering it's a fairly useful administration panel for smaller, say self-hosted projects, it's nice to see it getting some much needed UI love. + +### Auto-complete + +I've personally had to wrestle quite a lot with performance in the Django Admin caused solely by foreign key and many-to-many fields. By default, the Django Admin renders these as ``, it renders a custom widget which only searches and populates the search results when the user interacts with it. This will greatly increase the performance in large forms. + +## Stronger password hashes + +Django uses SHA256 to encrypt passwords, and then applies PBKDF2 over the top, to further strengthen the hash. I don't want to go into what those are and why they're there now, but trust that it's a very strong hash. + +Django 2.0 increases the number of PBKDF2 iterations from 36000 to 100000. This is a very large increase in iterations, and is meant to increase to 180000 rounds in Django 3.0. This only affects new users, or existing users when they change their passwords. For someone like me, this is great news! + +This may have an impact on how long tests take to run, if they're constantly creating and destroying users, as PBKDF2 can increase how long it takes to calculate passwords, which is compounded by needing to run it many times in test cases. I've seen 25% increases in test speeds just by swapping to an MD5-based hashing backend, which doesn't use PBKDF2 at all. + +## Files can be opened as context managers + +Anyone who's opened files with Python, you'll have seen the context manager pattern, and hopefully understand why it's significantly better than manually opening and closing the file handler manually. This pattern can now be used with Django files from file / image fields. This results in slightly cleaner code which is less prone to leaving handles open to files which aren't needed anymore. + +## New Database functions + +One of the largest changes in Django 2.0 - 2.2 is the plethora of database functions added. These database functions allow more complex queries than were previously allowed, enabling more computation to be done by the database, rather than requiring pulling all the data into python land and operating there. + +As with many other things in Django, said functions are named fairly well: + +### 2.0 +- `StrIndex` + +### 2.1 +- `Chr` +- `Left` +- `LPad` +- `LTrim` +- `Ord` +- `Repeat` +- `Replace` +- `Right` +- `RPad` +- `RTrim` +- `Trim` + +### 2.2 +- `Reverse` +- `NullIf` +- `Abs` +- `ACos` +- `ASin` +- `ATan` +- `ATan2` +- `Ceil` +- `Cos` +- `Cot` +- `Degrees` +- `Exp` +- `Floor` +- `Ln` +- `Log` +- `Mod` +- `Pi` +- `Power` +- `Radians` +- `Round` +- `Sin` +- `Sqrt` +- `Tan` +- `ExtractIsoYear` + +With all these new functions, focusing around maths and string manipulation, database servers can be leveraged more, and less data returned to the application server. Exactly how useful these will be to most users, only time will tell. + +## `QuerySet` API + +### `QuerySet.iterator` chunk size + +`QuerySet.iterator` is an efficient way of loading very large datasets into Django to be used. Simply iterating over a queryset loads the entire result set into memory, and then iterates over it. `iterator` uses cursors and pagination to chunk up the data, so a much smaller amount of data is stored in memory at once. The ability to specify a chunk size allows tuning of this to improve performance. The default is 2000, which represents something [close to how it worked before](https://www.postgresql.org/message-id/4D2F2C71.8080805%40dndg.it) + +### `QuerySet.values_list` can return named tuples + +Named tuples are much like tuples, but their keys are, well, named! This is a much lighter data structure than a dictionary, and is also immutable. `values_list` being able to return these means the returned objects can be deconstructed in a much nicer way, and allow stronger type hinting. + +### `QuerySet.explain`, explained. + +The new `explain` method on a `QuerySet` hooks into the existing SQL `EXPLAIN` statement to provide additional execution detail on queries. This may be useful to diagnose slow running queries and attempt to optimise them. + +### `QuerySet.bulk_update` + +Updating many model instances at once often required either using a separate update query, or iterating over queries, resulting in `O(n)` queries. Neither of these are ideal. Django 2.2 introduces the `bulk_update` method, which takes a list of modified model instances, and saves them in a single query. + +`bulk_update` requires knowledge on which fields it's updating as the second argument, therefore if there may be modifications to differing fields per instance, this may not be ideal. + +Personally, I can't think of many places this will be necessary, but I'm sure someone can! + +## `createsuperuser` password validators + +Django 1.11 added support for password validators, which can be used to measure the strength of users passwords against pre-defined requirements. Often during development, these may not be useful, and may be annoying. `createsuperuser` now prompts if these validators should be ignored, allowing super users to have weaker passwords for development. + +Even though this exists, please don't use it in production! + +## Secure JSON serialization into HTML + +Anyone who's had to dump JSON blobs into HTML pages should have come across [`django-argonauts`](https://github.com/fusionbox/django-argonauts) (if you're doing this _without_ `django-argonauts`, fear). `django-argonauts` helps prevent multiple different classes of XSS attacks, there's more information on this in the [project's README](https://github.com/fusionbox/django-argonauts#filter). + +Django now has some built-in support for protecting against these kinds of attacks, from the new `json_script` filter. This takes an object in template context, serializes it to JSON (securely), and wraps it in a `script` tag, resulting in: + +```html + +``` + +This can then be used by JavaScript directly by getting the tag by ID. If you still need to inject data directly into JavaScript source, `django-argonauts` still provides additional functionality. + + +## Constraints + +The new constraints API in Django 2.2 allows for far greater control of database-level validation on model fields than previously available in field validators. These validators are applied at the model level, rather than the field level. Django 2.2 comes with 2 existing constraints: `UniqueConstraint` and `CheckConstraint`. Both constraints are executed at the database level (as additional queries rather than column-level constraints), which whilst making them faster when doing complex relationship-level validation, also increases the number of queries executed when modifying a model instance. + +`UniqueConstranint` creates a unique constraint with any number of fields, in much the same way `unique_together` worked. `UniqueConstraint` also provides an additional `condition` argument, which specifies additional `Q` objects which must also apply. For example, `UniqueConstraint(fields=['user'], condition=Q(status='DRAFT')` ensures that each user only has one draft. + +`CheckConstraint` works much like standard field-level validators, however can work on multiple fields at once, as it uses `Q` objects to specify the validation. + +## No more headers in migrations + +Whenever `manage.py makemigrations` is run, Django injects a header into the file with the generated date and version. + +```plain +# -*- coding: utf-8 -*- +# Generated by Django 1.11.1 on 2017-06-07 16:10 +``` + +The new `--no-header` argument removes this when generating new migrations. In a typical workflow, there's little reason to remove this, but it's nice there's the option now! + +## Migration planning + +When executing migrations, especially in a production environment, it's useful to know which migrations are going to run. This is especially useful when certain migrations may require some site downtime. Previously, it was possible to see the migrations to run by using `manage.py showmigrations | grep -F "[ ]"`, but this is less than ideal, and a bit of a hack. + +Django 2.2 adds the ability to see the migrations before they are executed, rather than having to roll this functionality yourself. This is done using the new `--plan` flag. + +## `request.headers` + +Previously, `request.META` gave access to HTTP headers, in a slightly weird way. + +> any HTTP headers in the request are converted to META keys by converting all characters to upper-case, replacing any hyphens with underscores and adding an `HTTP_` prefix to the name. So, for example, a header called `X-Bender` would be mapped to the META key `HTTP_X_BENDER`. + +For anyone who's worked with raw HTTP headers in the past, this is a little weird. Now, `request` objects have a `headers` attribute which allows a far more sane API over the raw request headers. As all headers should be, the accessing API is case-insensitive! + +## Use of `sqlparse` + +In previous versions, Django's ORM handled every aspect of constructing SQL queries. This added a lot of additional, and arguably unnecessary code to the core of Django. Django 2.2 adds a new dependency which takes care of this: `sqlparse`. `sqlparse` is a library to handle AST parsing of SQL, allowing the conversion from SQL text to Python objects, and vice versa. This doesn't extract Django's ORM into an external package, just remove a small section of it in favour of a existing library. + +Using an external library brings with it many benefits. There's now less code inside the core Django codebase, meaning there's less for the core developers to manage and tie in to Django's release cycle. **(Wild speculation alert!)** It also _might_ mean it gets faster. Society is built on specialisation, therefore hopefully a library designed to do SQL parsing will be faster and more robust than the one originally written for Django. + +## Watchman + +[Watchman](https://facebook.github.io/watchman/) is a technology from Facebook which enables efficient and powerful file watching in a directory. Django now has the ability to use this when doing live code reload in the dev server, rather than the pure-python alternative. This will give massive performance improvement on large codebases, and use fewer resources as it does. + +Watchman support isn't enabled by default. It requires an additional optional dependency `pywatchman` to operate. + +## Database instrumentation + +Django supports many different ways of modifying the querying and model lifecycle, from executing arbitrary SQL, to using signals to listen for specific model events. Django 2.0 +introduces instrumentation, which allows intermediary code to be executed for each query, which allows for modification, logging, and other modifications. + +An interesting use for this would be explicitly disabling queries in certain parts of the code, with [`django-zen-queries`](https://github.com/dabapps/django-zen-queries) (ships in https://github.com/dabapps/django-zen-queries/pull/12) + +## Upgrading + +With Django 2.2 now released, it's time to actually start upgrading. Django 1.11 stops receiving support in April 2020, so large complex codebases don't have long! The next LTS version, 3.2, is due in April 2021. Who knows what Django will look like then! + +(On a complete tangent, don't do large software releases on April 1st!)