362 lines
8.9 KiB
Markdown
362 lines
8.9 KiB
Markdown
---
|
|
title: Recovering deleted Wagtail pages and Django models
|
|
class: text-center
|
|
highlighter: shiki
|
|
transition: slide-left
|
|
mdc: true
|
|
themeConfig:
|
|
primary: '#fd5765'
|
|
---
|
|
|
|
# Recovering [deleted]{style="color: #fd5765"} Wagtail pages and/or Django models
|
|
|
|
### Jake Howard{.mt-10}
|
|
|
|
<ul class="list-none! text-sm [&>li]:m-0! mt-1 uppercase">
|
|
<li>Senior Systems Engineer @ Torchbox</li>
|
|
<li>Core Team, Security Team & Performance Working Group @ Wagtail</li>
|
|
</ul>
|
|
|
|
<ul class="list-none! text-sm [&>li]:m-0! mt-3">
|
|
<li><mdi-earth /> theorangeone.net</li>
|
|
<li><mdi-twitter /> @RealOrangeOne</li>
|
|
<li><mdi-github /> @RealOrangeOne</li>
|
|
<li><mdi-mastodon /> @jake@theorangeone.net</li>
|
|
</ul>
|
|
|
|
---
|
|
layout: cover
|
|
background: /intranet.png
|
|
---
|
|
|
|
# Setting the scene
|
|
|
|
<!--
|
|
- People usually use Wagtail as a website or blog
|
|
- But it works really well as an intranet too
|
|
- At Torchbox, we use it for internal documentation ("intranet")
|
|
- Processes
|
|
- Company information
|
|
- Links to other places etc
|
|
- Been around for a while
|
|
- In 2022, we restructured the content
|
|
- Make it easier to find things
|
|
- Remove duplication
|
|
- This didn't quite go to plan
|
|
- One afternoon, I was looking to reference a process, and couldn't find it
|
|
- Turns out, the entire "Sysadmin" section had completely vanished
|
|
-->
|
|
|
|
---
|
|
layout: cover
|
|
background: /site-history.png
|
|
---
|
|
|
|
# Site history report
|
|
|
|
<!--
|
|
- First step: Understanding what happened
|
|
- The site history report!
|
|
- Fortunately, Wagtail showed _almost_ exactly what had happened, and what I expected
|
|
- One staff member deleted the "Sysadmin" section a few days before
|
|
- Which deleted every page under it, all 105 of them
|
|
- "Radical reorganisation"
|
|
-->
|
|
|
|
---
|
|
layout: image
|
|
image: /chat.png
|
|
backgroundSize: contain
|
|
---
|
|
|
|
<!--
|
|
- I messaged the person, to better understand what happened
|
|
- Assuming they didn't mean to delete all that content
|
|
- Hanlon's Razor
|
|
- They'd made a new "Sysadmin" section a while ago, before switching strategy to move pages in the existing tree
|
|
- They then deleted the wrong one
|
|
- Sure, Wagtail shows a confirmation when you're deleting pages, but when you're deleting a lot of pages, and expecting to delete pages, you might not read the message perfectly
|
|
- With the content gone, I had to restore from backups.
|
|
-->
|
|
|
|
---
|
|
layout: section
|
|
---
|
|
|
|
# Restoring from backups
|
|
|
|
<!--
|
|
- Our intranet is a living document, it gets updated fairly often
|
|
- Rolling back the entire system almost 2 days would have meant potentially losing critical changes
|
|
- Not to mention people's time they spent making the changes
|
|
- It'd be annoying, but we _could_ do it, but I'd rather another solution
|
|
-->
|
|
|
|
---
|
|
layout: section
|
|
---
|
|
|
|
# _Partially_ restoring from backups
|
|
|
|
<!--
|
|
- Ideally, what I needed was to restore only the sysadmin pages, leaving all others completely untouched.
|
|
- Using a few tricks of Django and Wagtail internals, it's absolutely possible, and we did it
|
|
- With 0 downtime, too!
|
|
-->
|
|
|
|
---
|
|
layout: section
|
|
---
|
|
|
|
## 1.
|
|
# Spin up a database backup
|
|
|
|
<!--
|
|
- We backup our intranet nightly, so I downloaded a backup from before the incident
|
|
- Start the codebase locally so I can interrogate it
|
|
-->
|
|
|
|
---
|
|
layout: section
|
|
---
|
|
|
|
## 2.
|
|
# Locate the page models
|
|
|
|
<div class="pt-5 text-left">
|
|
|
|
```python
|
|
from wagtail.models import Page
|
|
|
|
sysadmin_page = Page.objects.get(id=91)
|
|
|
|
child_pages = sysadmin_page.get_descendants()
|
|
```
|
|
|
|
</div>
|
|
|
|
<!--
|
|
- Behind the scenes, Wagtail pages are a tree, implemented using `django-treebeard`.
|
|
- When a page is deleted, treebeard is the one who finds all the child pages and deletes them too
|
|
- And then Django and postgres deal with cascading the delete
|
|
-->
|
|
|
|
---
|
|
layout: section
|
|
---
|
|
|
|
## 3.
|
|
# Locate what was deleted
|
|
|
|
<div class="pt-5 text-left">
|
|
|
|
```python
|
|
from django.contrib.admin.utils import NestedObjects
|
|
|
|
collector = NestedObjects()
|
|
collector.collect(list(child_pages) + [sysadmin_page])
|
|
```
|
|
|
|
</div>
|
|
|
|
<!--
|
|
This is where the magic happens
|
|
- Deleting a page deletes more than just a page
|
|
- The specific model
|
|
- Revisions
|
|
- Related models
|
|
- Through tables
|
|
- `get_descendants` won't get all those
|
|
- Calling `.delete` gives you the number of objects, and it's quite a lot
|
|
- If you've ever used the Django admin, you know it's capable of finding every model instance before a delete
|
|
- That's implemented with an undocumented but simple to use API
|
|
- Yes, that's really it. It doesn't delete the models, it just tells us what _would_ be if we triggered a delete.
|
|
-->
|
|
|
|
---
|
|
layout: section
|
|
---
|
|
|
|
## 4.
|
|
# Serialize
|
|
|
|
<div class="pt-5 text-left">
|
|
|
|
```python {all|3-5}
|
|
from django.core import serializers
|
|
|
|
class NoM2MSerializer(Serializer):
|
|
def handle_m2m_field(self, obj, field):
|
|
pass
|
|
|
|
def get_model_instances():
|
|
for qs in collector.data.values():
|
|
yield from qs
|
|
|
|
with open("deleted-models.json", "w") as f:
|
|
NoM2MSerializer().serialize(
|
|
get_model_instances(),
|
|
stream=f
|
|
)
|
|
```
|
|
|
|
</div>
|
|
|
|
<style>
|
|
pre.shiki {
|
|
font-size: 0.8rem !important;
|
|
line-height: 18px;
|
|
}
|
|
</style>
|
|
|
|
<!--
|
|
- `collector.data` now contains all the model instances which were deleted, in memory on my laptop
|
|
- My laptop isn't what's running production
|
|
- Need to serialize the models into an intermdiary format which can be then be loaded onto production
|
|
- If you're thinking of fixtures, you're right
|
|
- Django's fixtures create a JSON representation of a model, so they can be saved in 1 location and loaded into another
|
|
- Mostly useful for complex test fixtures (hence the name), but generally useful for cases like this
|
|
- [click]`NoM2MSerializer` is a bit special
|
|
- When Django serializes a model with a m2m which doesn't use a custom table, it inlines the definition, because it's easier to work with
|
|
- However, `NestedObjects` still finds these through tables, and tries to load them separately
|
|
- Resulting in duplicate objects and referential integrity issues
|
|
- Instead, we exclude them
|
|
-->
|
|
|
|
---
|
|
layout: section
|
|
---
|
|
|
|
## 4a.
|
|
# [De]{.italic}serialize
|
|
|
|
### `manage.py loaddata`
|
|
|
|
<!--
|
|
- We have a JSON file, the inverse is just `manage.py loaddata`
|
|
-->
|
|
|
|
---
|
|
layout: center
|
|
---
|
|
|
|
# `restore-deleted-pages.py`
|
|
|
|
```python {all}{lines:true}
|
|
from django.contrib.admin.utils import NestedObjects
|
|
from django.core import serializers
|
|
|
|
from wagtail.models import Page
|
|
|
|
class NoM2MSerializer(Serializer):
|
|
def handle_m2m_field(self, obj, field):
|
|
pass
|
|
|
|
sysadmin_page = Page.objects.get(id=91)
|
|
|
|
child_pages = sysadmin_page.get_descendants()
|
|
|
|
collector = NestedObjects()
|
|
collector.collect(list(child_pages) + [sysadmin_page])
|
|
|
|
def get_model_instances():
|
|
for qs in collector.data.values():
|
|
yield from qs
|
|
|
|
with open("deleted-models.json", "w") as f:
|
|
NoM2MSerializer().serialize(
|
|
get_model_instances(),
|
|
stream=f
|
|
)
|
|
```
|
|
|
|
<style>
|
|
pre.shiki {
|
|
font-size: 0.7rem !important;
|
|
line-height: 17px;
|
|
}
|
|
</style>
|
|
|
|
<!--
|
|
- If we combine it all together, this is the big script we end up with
|
|
-->
|
|
|
|
---
|
|
layout: fact
|
|
---
|
|
|
|
### 5.
|
|
# **Test!**
|
|
|
|
<!--
|
|
- For what I hope are obvious reasons, this needed to be tested!
|
|
- I deleted the page through the wagtail admin locally, and then restored them to confirm they're all the same
|
|
- I'm glad I did, because there was an issue: Search indexes
|
|
- The search index objects (we use postgres) were picked up by `NestedObjects`
|
|
- They didn't like being restored
|
|
- So I skipped them and moved on, knowing I'd just rebuild the index later.
|
|
- `manage.py fixtree` also reports any tree issues, which there weren't
|
|
-->
|
|
|
|
---
|
|
layout: image-right
|
|
image: /red-button.png
|
|
class: flex justify-center flex-col items-center
|
|
---
|
|
|
|
### 6.
|
|
# Showtime!
|
|
|
|
<v-clicks>
|
|
|
|
1. Backup! ✅
|
|
2. Send `deleted-models.json` to server ✅
|
|
3. `loaddata` ✅
|
|
4. `checktree` ✅
|
|
5. `update_index` ✅
|
|
6. `rebuild_reference_index` ✅
|
|
|
|
</v-clicks>
|
|
|
|
<!--
|
|
- The tense bit
|
|
- Once I was happy, I ran the same steps on production
|
|
- Our intranet runs on Heroku, so I had to do a few dances to get the JSON file up there.
|
|
- [click]Before I began, I did a backup, because I'm a good sysadmin
|
|
- [click]With the data file in place, [click]I crossed everything and ran `loaddata`
|
|
- Pages popped up in the admin as if they never left
|
|
- [click]`checktree` worked.
|
|
- [click]`update_index` worked.
|
|
- [click] As did `rebuild_reference_index`
|
|
- The new pages were now findable
|
|
-->
|
|
|
|
---
|
|
layout: cover
|
|
background: /sysadmin.png
|
|
---
|
|
|
|
# Conclusion
|
|
|
|
<!--
|
|
- With a few hours work, the pages were back
|
|
- There was no downtime
|
|
- No content freeze
|
|
- No data loss
|
|
- Most people didn't even know there was an issue
|
|
- I've used this trick a a few times in my career, for both Wagtail and plain Django sites
|
|
- Ironically, just a few weeks after the blog post was published
|
|
- Works identically for Django sites, so long as you know how to reconstruct the delete query.
|
|
- Hopefully this helps you out as much as it has me!
|
|
-->
|
|
|
|
---
|
|
layout: end
|
|
---
|
|
|
|
<div class="absolute right-1/2 translate-x-1/2 top-3">
|
|
<img src="/blog-post-qrcode.png" width="190px" />
|
|
</div>
|
|
|
|
|
|
https://wagtail.org/blog/recovering-deleted-pages-and-models/
|