theorangeone.net-legacy/content/posts/recovering-orphaned-git-commits.md at 13d880a05dfa851eb68d9ef7b3d843ac212adbe9

They're posts, not articles. This matches the URL

2022-01-07 19:22:10 +00:00

3.8 KiB

Raw Blame History

title

date

image

Git object storage

Git does some pretty magical thing behind the scenes. The important one here being that Git stores files in "objects", referenced by a hash of their contents. If you move a file, it doesn't duplicate the object, as the content hasn't changed. When you create a commit, it references these objects. Commits themselves are also objects, which are referenced by branches. If you're interested in more, checkout Plumbers guide to Git.

The most important thing about objects is that it's very rare they actually get deleted. If have a committed file you delete, the object is still there. Most importantly to my case, if you do some branch-fu and remove a commit, the commit may still exist.

Finding commits / files

As mentioned before, commits and files are both "objects". To find my missing commit, all I need to do is look through the object files for a string which I know to be in the commit message or file body, right?

Wrong! Sort of. Git objects are stored compressed, which means simply using grep (or rg if you're cool) to search the files doesn't work. git does have a command to search through files (git grep) but for this use case it wasn't appropriate, as that only searches the current state of the checked out repository. Instead, we need to use some git tooling to get at the data cleanly.

The first step is to list all objects git knows about, including those not referenced by branches. StackOverflow to the rescue on this one. This script will list out SHAs of all objects, which can then be pass into git show to get the real content, rather than the compressed version. Piping that into a text file, I've now got an entire dump of everything git knows about my repository: commits, files, the lot.

bash ~/object-list.sh | xargs -n1 git show > ~/out.txt

It ended up being a lot more than I wanted (the file was around 79MB), but hey I'll take having too much context over not enough!

Searching large files

For searching large files, I recommend using glogg. It's pretty barebones, but it deals with huge files incredibly well (not that 79MB is very large).

Searching through the output file, I eventually found the commits I needed. Because the file contained the output of git show, it gave me 2 options. Either I could copy the content / diff out and store it for later use, or, because git show shows commit information, I could git cherry-pick the commit SHA onto my branch, and push it. I went for the former, because it was simpler, easier, and I decided I didn't want to push those changes quite yet.

Lessons

Whilst rather stressful in the moment, experiences like these aren't without their lessons:

Don't delete your local copies of posts until they're actually live, rather than just committed
Be careful when relying on git magic and rebasing
git is pretty damn good at making sure you don't lose any data

3.8 KiB Raw Blame History

Git object storage

Finding commits / files

Searching large files

Lessons

3.8 KiB

Raw Blame History