1
Fork 0

Add speaker notes

This commit is contained in:
Jake Howard 2024-06-11 21:03:01 +01:00
parent eb506aff68
commit d5a5bc6788
Signed by: jake
GPG key ID: 57AFB45680EDD477

156
slides.md
View file

@ -10,7 +10,7 @@ themeConfig:
# Recovering [deleted]{style="color: #fd5765"} Wagtail pages and/or Django models # Recovering [deleted]{style="color: #fd5765"} Wagtail pages and/or Django models
### Jake Howard{style="color: #e85537;" .mt-10 } ### Jake Howard{.mt-10}
<ul class="list-none! text-sm [&>li]:m-0! mt-1 uppercase"> <ul class="list-none! text-sm [&>li]:m-0! mt-1 uppercase">
<li>Senior Systems Engineer @ Torchbox</li> <li>Senior Systems Engineer @ Torchbox</li>
@ -31,6 +31,22 @@ background: /intranet.png
# Setting the scene # Setting the scene
<!--
- People usually use Wagtail as a website or blog
- But it works really well as an intranet too
- At Torchbox, we use it for internal documentation ("intranet")
- Processes
- Company information
- Links to other places etc
- Been around for a while
- In 2022, we restructured the content
- Make it easier to find things
- Remove duplication
- This didn't quite go to plan
- One afternoon, I was looking to reference a process, and couldn't find it
- Turns out, the entire "Sysadmin" section had completely vanished
-->
--- ---
layout: cover layout: cover
background: /site-history.png background: /site-history.png
@ -38,24 +54,56 @@ background: /site-history.png
# Site history report # Site history report
<!--
- First step: Understanding what happened
- The site history report!
- Fortunately, Wagtail showed _almost_ exactly what had happened, and what I expected
- One staff member deleted the "Sysadmin" section a few days before
- Which deleted every page under it, all 105 of them
- "Radical reorganisation"
-->
--- ---
layout: image layout: image
image: /chat.png image: /chat.png
backgroundSize: contain backgroundSize: contain
--- ---
<!--
- I messaged the person, to better understand what happened
- Assuming they didn't mean to delete all that content
- Hanlon's Razor
- They'd made a new "Sysadmin" section a while ago, before switching strategy to move pages in the existing tree
- They then deleted the wrong one
- Sure, Wagtail shows a confirmation when you're deleting pages, but when you're deleting a lot of pages, and expecting to delete pages, you might not read the message perfectly
- With the content gone, I had to restore from backups.
-->
--- ---
layout: section layout: section
--- ---
# Restoring from backups # Restoring from backups
<!--
- Our intranet is a living document, it gets updated fairly often
- Rolling back the entire system almost 2 days would have meant potentially losing critical changes
- Not to mention people's time they spent making the changes
- It'd be annoying, but we _could_ do it, but I'd rather another solution
-->
--- ---
layout: section layout: section
--- ---
# _Partially_ restoring from backups # _Partially_ restoring from backups
<!--
- Ideally, what I needed was to restore only the sysadmin pages, leaving all others completely untouched.
- Using a few tricks of Django and Wagtail internals, it's absolutely possible, and we did it
- With 0 downtime, too!
-->
--- ---
layout: section layout: section
--- ---
@ -63,6 +111,11 @@ layout: section
## 1. ## 1.
# Spin up a database backup # Spin up a database backup
<!--
- We backup our intranet nightly, so I downloaded a backup from before the incident
- Start the codebase locally so I can interrogate it
-->
--- ---
layout: section layout: section
--- ---
@ -82,6 +135,12 @@ child_pages = sysadmin_page.get_descendants()
</div> </div>
<!--
- Behind the scenes, Wagtail pages are a tree, implemented using `django-treebeard`.
- When a page is deleted, treebeard is the one who finds all the child pages and deletes them too
- And then Django and postgres deal with cascading the delete
-->
--- ---
layout: section layout: section
--- ---
@ -100,6 +159,20 @@ collector.collect(list(child_pages) + [sysadmin_page])
</div> </div>
<!--
This is where the magic happens
- Deleting a page deletes more than just a page
- The specific model
- Revisions
- Related models
- Through tables
- `get_descendants` won't get all those
- Calling `.delete` gives you the number of objects, and it's quite a lot
- If you've ever used the Django admin, you know it's capable of finding every model instance before a delete
- That's implemented with an undocumented but simple to use API
- Yes, that's really it. It doesn't delete the models, it just tells us what _would_ be if we triggered a delete.
-->
--- ---
layout: section layout: section
--- ---
@ -109,7 +182,7 @@ layout: section
<div class="pt-5 text-left"> <div class="pt-5 text-left">
```python {all|3-5|all} ```python {all|3-5}
from django.core import serializers from django.core import serializers
class NoM2MSerializer(Serializer): class NoM2MSerializer(Serializer):
@ -136,6 +209,20 @@ with open("deleted-models.json", "w") as f:
} }
</style> </style>
<!--
- `collector.data` now contains all the model instances which were deleted, in memory on my laptop
- My laptop isn't what's running production
- Need to serialize the models into an intermdiary format which can be then be loaded onto production
- If you're thinking of fixtures, you're right
- Django's fixtures create a JSON representation of a model, so they can be saved in 1 location and loaded into another
- Mostly useful for complex test fixtures (hence the name), but generally useful for cases like this
- [click]`NoM2MSerializer` is a bit special
- When Django serializes a model with a m2m which doesn't use a custom table, it inlines the definition, because it's easier to work with
- However, `NestedObjects` still finds these through tables, and tries to load them separately
- Resulting in duplicate objects and referential integrity issues
- Instead, we exclude them
-->
--- ---
layout: section layout: section
--- ---
@ -145,6 +232,10 @@ layout: section
### `manage.py loaddata` ### `manage.py loaddata`
<!--
- We have a JSON file, the inverse is just `manage.py loaddata`
-->
--- ---
layout: center layout: center
--- ---
@ -186,6 +277,10 @@ with open("deleted-models.json", "w") as f:
} }
</style> </style>
<!--
- If we combine it all together, this is the big script we end up with
-->
--- ---
layout: fact layout: fact
--- ---
@ -193,6 +288,16 @@ layout: fact
### 5. ### 5.
# **Test!** # **Test!**
<!--
- For what I hope are obvious reasons, this needed to be tested!
- I deleted the page through the wagtail admin locally, and then restored them to confirm they're all the same
- I'm glad I did, because there was an issue: Search indexes
- The search index objects (we use postgres) were picked up by `NestedObjects`
- They didn't like being restored
- So I skipped them and moved on, knowing I'd just rebuild the index later.
- `manage.py fixtree` also reports any tree issues, which there weren't
-->
--- ---
layout: image-right layout: image-right
image: /red-button.png image: /red-button.png
@ -202,18 +307,29 @@ class: flex justify-center flex-col items-center
### 6. ### 6.
# Showtime! # Showtime!
--- <v-clicks>
layout: image-right
image: https://images.unsplash.com/photo-1622021134395-d26aab83c221?q=80&w=1470&auto=format&fit=crop&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D
class: flex justify-center flex-col text-xl
---
1. Backup! 1. Backup! ✅
2. Transfer `deleted-models.json` to server 2. Send `deleted-models.json` to server ✅
3. `loaddata` 3. `loaddata`
4. `checktree` 4. `checktree`
5. `update_index` 5. `update_index`
6. `rebuild_reference_index` 6. `rebuild_reference_index`
</v-clicks>
<!--
- The tense bit
- Once I was happy, I ran the same steps on production
- Our intranet runs on Heroku, so I had to do a few dances to get the JSON file up there.
- [click]Before I began, I did a backup, because I'm a good sysadmin
- [click]With the data file in place, [click]I crossed everything and ran `loaddata`
- Pages popped up in the admin as if they never left
- [click]`checktree` worked.
- [click]`update_index` worked.
- [click] As did `rebuild_reference_index`
- The new pages were now findable
-->
--- ---
layout: cover layout: cover
@ -222,8 +338,20 @@ background: /sysadmin.png
# Conclusion # Conclusion
<!--
- With a few hours work, the pages were back
- There was no downtime
- No content freeze
- No data loss
- Most people didn't even know there was an issue
- I've used this trick a a few times in my career, for both Wagtail and plain Django sites
- Ironically, just a few weeks after the blog post was published
- Works identically for Django sites, so long as you know how to reconstruct the delete query.
- Hopefully this helps you out as much as it has me!
-->
--- ---
layout: end layout: end
--- ---
END https://wagtail.org/blog/recovering-deleted-pages-and-models/