1
Fork 0
theorangeone.net-legacy/content/posts/recovering-orphaned-git-commits.md

3.8 KiB

title date image tags
Recovering orphaned git commits 2020-10-22 unsplash:Vqg809B-SrE
programming

I recently had a far from fun morning. I had my website cloned on my desktop, with some commits which weren't upstreamed. Having done some work on my laptop, upstream was ahead of my desktop, which needed rectifying.

git pull -r is a great feature of git, which does a rebase rather than a merge commit when pulling a diverged remote. Normally this would rebase my commits, leaving me with the 2 local commits I'd made to push locally. For reasons beyond my understanding, instead I was left with 0 commits to push - the 2 commits I had locally vanished.

Git object storage

Git does some pretty magical thing behind the scenes. The important one here being that Git stores files in "objects", referenced by a hash of their contents. If you move a file, it doesn't duplicate the object, as the content hasn't changed. When you create a commit, it references these objects. Commits themselves are also objects, which are referenced by branches. If you're interested in more, checkout Plumbers guide to Git.

The most important thing about objects is that it's very rare they actually get deleted. If have a committed file you delete, the object is still there. Most importantly to my case, if you do some branch-fu and remove a commit, the commit may still exist.

Finding commits / files

As mentioned before, commits and files are both "objects". To find my missing commit, all I need to do is look through the object files for a string which I know to be in the commit message or file body, right?

Wrong! Sort of. Git objects are stored compressed, which means simply using grep (or rg if you're cool) to search the files doesn't work. git does have a command to search through files (git grep) but for this use case it wasn't appropriate, as that only searches the current state of the checked out repository. Instead, we need to use some git tooling to get at the data cleanly.

The first step is to list all objects git knows about, including those not referenced by branches. StackOverflow to the rescue on this one. This script will list out SHAs of all objects, which can then be pass into git show to get the real content, rather than the compressed version. Piping that into a text file, I've now got an entire dump of everything git knows about my repository: commits, files, the lot.

bash ~/object-list.sh | xargs -n1 git show > ~/out.txt

It ended up being a lot more than I wanted (the file was around 79MB), but hey I'll take having too much context over not enough!

Searching large files

For searching large files, I recommend using glogg. It's pretty barebones, but it deals with huge files incredibly well (not that 79MB is very large).

Searching through the output file, I eventually found the commits I needed. Because the file contained the output of git show, it gave me 2 options. Either I could copy the content / diff out and store it for later use, or, because git show shows commit information, I could git cherry-pick the commit SHA onto my branch, and push it. I went for the former, because it was simpler, easier, and I decided I didn't want to push those changes quite yet.

Lessons

Whilst rather stressful in the moment, experiences like these aren't without their lessons:

  • Don't delete your local copies of articles until they're actually live, rather than just committed
  • Be careful when relying on git magic and rebasing
  • git is pretty damn good at making sure you don't lose any data