1
Fork 0
theorangeone.net-legacy/content/posts/django-22.md

14 KiB

title date
Django 2.2 2019-04-01

April marks the release of Django 2.2, the latest LTS version of the popular Python web framework. Django 2.2 marks almost 2 years of development since the last LTS release, 1.11 in April 2017, and brings with it some very large improvements and changes which naturally come with a major version increase.

Django historically works off the LTS pattern of software releasing, providing 2 channels. The LTS versions are maintained for longer than the regular versions, and receive regular bug fixes and security patches in line with the main release channel.

Django update cycle

The bump between 1.11 and 2.2 also bought with it the updates from 2.0 and 2.1. Those features which have been being used by users for 18 months finally come to those who need the stability of an LTS release. I've not delved too far into the 2.x releases so far, as most of what I do strongly benefits from using an LTS-based version.

Python 2

Ironically named, Django 2 is the first Django release to completely drop support for Python 2. Django 2.2 will require at least 3.5. Python 2 (commonly referred to as 'legacy python') will retire in 2020, so it's great to see Django dropping support well beforehand so users can start migrating their larger codebases. For years there's been a debate as to which major version of python is better: 2 or 3. Considering Python 2 now has an end-of-life date, and the performance gap is now a non-issue, hopefully this debate is over.

Simplified URL Routing

In previous versions, Django's URL system relied heavily on regular expressions to match paths to views. This works fine for very simple data types (like integers, which can simply be \d+), but more complex data structures lead to more interesting URL patterns. In the past, I've had to resort to a simpler URL pattern, and then doing more URL validation in the view, which is less than ideal. UUIDs were famously very difficult to do, with many people resorting to [0-9a-f-]+, whereas the correct regex is in fact [0-9a-f]{12}4[0-9a-f]{3}[89ab][0-9a-f]{15}\Z.

Thankfully, Django 2.0 fixed this, by drastically simplifying the URL routing syntax to allow for special keywords to be in place of RegEx capture groups. The new syntax is available with the new path function:

path('articles/<int:year>/', views.year_archive),

This means UUID-based paths can now be written as:

path('articles/<uuid:year>/', views.year_archive),

This is a significant improvement over the previous methods. The intention is to deprecate the previous url function, and rename it re_path, but it's undefined when that will happen. There's support for the following shorthand types:

  • str
  • int
  • slug
  • uuid
  • path (any non-empty string)

Django Admin

Mobile-friendly

The django admin is now mobile friendly. This isn't a massive deal considering likely how few people are using it in production, but considering it's a fairly useful administration panel for smaller, say self-hosted projects, it's nice to see it getting some much needed UI love.

Auto-complete

I've personally had to wrestle quite a lot with performance in the Django Admin caused solely by foreign key and many-to-many fields. By default, the Django Admin renders these as <select /> elements, with all the possible records. This can lead to large lists of models, and potentially some O(n) queries if the models __str__ method also calls queries.

Django 2.0 resolves this by adding an auto-complete widget for these 2, which means rather than rendering a <select />, it renders a custom widget which only searches and populates the search results when the user interacts with it. This will greatly increase the performance in large forms.

Stronger password hashes

Django uses SHA256 to encrypt passwords, and then applies PBKDF2 over the top, to further strengthen the hash. I don't want to go into what those are and why they're there now, but trust that it's a very strong hash.

Django 2.0 increases the number of PBKDF2 iterations from 36000 to 100000. This is a very large increase in iterations, and is meant to increase to 180000 rounds in Django 3.0. This only affects new users, or existing users when they change their passwords. For someone like me, this is great news!

This may have an impact on how long tests take to run, if they're constantly creating and destroying users, as PBKDF2 can increase how long it takes to calculate passwords, which is compounded by needing to run it many times in test cases. I've seen 25% increases in test speeds just by swapping to an MD5-based hashing backend, which doesn't use PBKDF2 at all.

Files can be opened as context managers

Anyone who's opened files with Python, you'll have seen the context manager pattern, and hopefully understand why it's significantly better than manually opening and closing the file handler manually. This pattern can now be used with Django files from file / image fields. This results in slightly cleaner code which is less prone to leaving handles open to files which aren't needed anymore.

New Database functions

One of the largest changes in Django 2.0 - 2.2 is the plethora of database functions added. These database functions allow more complex queries than were previously allowed, enabling more computation to be done by the database, rather than requiring pulling all the data into python land and operating there.

As with many other things in Django, said functions are named fairly well:

2.0

  • StrIndex

2.1

  • Chr
  • Left
  • LPad
  • LTrim
  • Ord
  • Repeat
  • Replace
  • Right
  • RPad
  • RTrim
  • Trim

2.2

  • Reverse
  • NullIf
  • Abs
  • ACos
  • ASin
  • ATan
  • ATan2
  • Ceil
  • Cos
  • Cot
  • Degrees
  • Exp
  • Floor
  • Ln
  • Log
  • Mod
  • Pi
  • Power
  • Radians
  • Round
  • Sin
  • Sqrt
  • Tan
  • ExtractIsoYear

With all these new functions, focusing around maths and string manipulation, database servers can be leveraged more, and less data returned to the application server. Exactly how useful these will be to most users, only time will tell.

QuerySet API

QuerySet.iterator chunk size

QuerySet.iterator is an efficient way of loading very large datasets into Django to be used. Simply iterating over a queryset loads the entire result set into memory, and then iterates over it. iterator uses cursors and pagination to chunk up the data, so a much smaller amount of data is stored in memory at once. The ability to specify a chunk size allows tuning of this to improve performance. The default is 2000, which represents something close to how it worked before

QuerySet.values_list can return named tuples

Named tuples are much like tuples, but their keys are, well, named! This is a much lighter data structure than a dictionary, and is also immutable. values_list being able to return these means the returned objects can be deconstructed in a much nicer way, and allow stronger type hinting.

QuerySet.explain, explained.

The new explain method on a QuerySet hooks into the existing SQL EXPLAIN statement to provide additional execution detail on queries. This may be useful to diagnose slow running queries and attempt to optimise them.

QuerySet.bulk_update

Updating many model instances at once often required either using a separate update query, or iterating over queries, resulting in O(n) queries. Neither of these are ideal. Django 2.2 introduces the bulk_update method, which takes a list of modified model instances, and saves them in a single query.

bulk_update requires knowledge on which fields it's updating as the second argument, therefore if there may be modifications to differing fields per instance, this may not be ideal.

Personally, I can't think of many places this will be necessary, but I'm sure someone can!

createsuperuser password validators

Django 1.11 added support for password validators, which can be used to measure the strength of users passwords against pre-defined requirements. Often during development, these may not be useful, and may be annoying. createsuperuser now prompts if these validators should be ignored, allowing super users to have weaker passwords for development.

Even though this exists, please don't use it in production!

Secure JSON serialization into HTML

Anyone who's had to dump JSON blobs into HTML pages should have come across django-argonauts (if you're doing this without django-argonauts, fear). django-argonauts helps prevent multiple different classes of XSS attacks, there's more information on this in the project's README.

Django now has some built-in support for protecting against these kinds of attacks, from the new json_script filter. This takes an object in template context, serializes it to JSON (securely), and wraps it in a script tag, resulting in:

<script id="hello-data" type="application/json">{"hello": "world"}</script>

This can then be used by JavaScript directly by getting the tag by ID. If you still need to inject data directly into JavaScript source, django-argonauts still provides additional functionality.

Constraints

The new constraints API in Django 2.2 allows for far greater control of database-level validation on model fields than previously available in field validators. These validators are applied at the model level, rather than the field level. Django 2.2 comes with 2 existing constraints: UniqueConstraint and CheckConstraint. Both constraints are executed at the database level (as additional queries rather than column-level constraints), which whilst making them faster when doing complex relationship-level validation, also increases the number of queries executed when modifying a model instance.

UniqueConstranint creates a unique constraint with any number of fields, in much the same way unique_together worked. UniqueConstraint also provides an additional condition argument, which specifies additional Q objects which must also apply. For example, UniqueConstraint(fields=['user'], condition=Q(status='DRAFT') ensures that each user only has one draft.

CheckConstraint works much like standard field-level validators, however can work on multiple fields at once, as it uses Q objects to specify the validation.

No more headers in migrations

Whenever manage.py makemigrations is run, Django injects a header into the file with the generated date and version.

# -*- coding: utf-8 -*-
# Generated by Django 1.11.1 on 2017-06-07 16:10

The new --no-header argument removes this when generating new migrations. In a typical workflow, there's little reason to remove this, but it's nice there's the option now!

Migration planning

When executing migrations, especially in a production environment, it's useful to know which migrations are going to run. This is especially useful when certain migrations may require some site downtime. Previously, it was possible to see the migrations to run by using manage.py showmigrations | grep -F "[ ]", but this is less than ideal, and a bit of a hack.

Django 2.2 adds the ability to see the migrations before they are executed, rather than having to roll this functionality yourself. This is done using the new --plan flag.

request.headers

Previously, request.META gave access to HTTP headers, in a slightly weird way.

any HTTP headers in the request are converted to META keys by converting all characters to upper-case, replacing any hyphens with underscores and adding an HTTP_ prefix to the name. So, for example, a header called X-Bender would be mapped to the META key HTTP_X_BENDER.

For anyone who's worked with raw HTTP headers in the past, this is a little weird. Now, request objects have a headers attribute which allows a far more sane API over the raw request headers. As all headers should be, the accessing API is case-insensitive!

Use of sqlparse

In previous versions, Django's ORM handled every aspect of constructing SQL queries. This added a lot of additional, and arguably unnecessary code to the core of Django. Django 2.2 adds a new dependency which takes care of this: sqlparse. sqlparse is a library to handle AST parsing of SQL, allowing the conversion from SQL text to Python objects, and vice versa. This doesn't extract Django's ORM into an external package, just remove a small section of it in favour of a existing library.

Using an external library brings with it many benefits. There's now less code inside the core Django codebase, meaning there's less for the core developers to manage and tie in to Django's release cycle. (Wild speculation alert!) It also might mean it gets faster. Society is built on specialisation, therefore hopefully a library designed to do SQL parsing will be faster and more robust than the one originally written for Django.

Watchman

Watchman is a technology from Facebook which enables efficient and powerful file watching in a directory. Django now has the ability to use this when doing live code reload in the dev server, rather than the pure-python alternative. This will give massive performance improvement on large codebases, and use fewer resources as it does.

Watchman support isn't enabled by default. It requires an additional optional dependency pywatchman to operate.

Database instrumentation

Django supports many different ways of modifying the querying and model lifecycle, from executing arbitrary SQL, to using signals to listen for specific model events. Django 2.0 introduces instrumentation, which allows intermediary code to be executed for each query, which allows for modification, logging, and other modifications.

An interesting use for this would be explicitly disabling queries in certain parts of the code, with django-zen-queries (ships in https://github.com/dabapps/django-zen-queries/pull/12)

Upgrading

With Django 2.2 now released, it's time to actually start upgrading. Django 1.11 stops receiving support in April 2020, so large complex codebases don't have long! The next LTS version, 3.2, is due in April 2021. Who knows what Django will look like then!

(On a complete tangent, don't do large software releases on April 1st!)