I has a blog!

| No Comments | No TrackBacks
I've decided that it's time to fail to have a blog again. Despite my intention to actually make more than three posts this time round, I expect I'll get fed up and kill it off again in a couple of months.

I installed Movable Type on dog a week or so ago and prepared a nice long boring entry about how much I hate BT. Dog responded by crashing and burning, completely nuking the file system. We've had to reinstall on new drives, and the last "backups" are actually the drives from the previous incarnation of dog which was replaced with a Dell 1950 in December 2007.

So far we're not really sure what happened to cause the data loss. Due to fail on our part we'd not enabled nproc ulimits, and this allowed a user to (unintentionally) trigger an uncontrollable fork bomb that floored the server. I decided to get the datacentre to power cycle the server, we've needed to do a kernel upgrade for ages anyway, and it seemed to be the quickest way to get dog back. It failed to come back after the power cycle. I headed to the datacentre to take a look, expecting it to be missing the firmware-bnx package since the upgrade to etch and a half (damn you Debian.) It was sitting at a fsck failure. Running a fsck by hand there were a couple of fixable errors, which I wasn't too worried about considering the hard reboot. Dog came back up and looked fine so I headed home.

Sadly all was not fine. About an hour later there was an oooops somewhere in the ext3 code, and the file system became unreadable. We requested another power cycle, and this time dog came back up without additional human intervention. While I was able to log in there were a lot of ext3 errors in dmesg and the root file system had been automagically remounted read-only. Not a happy file system. Suddenly ls decided that the only file on the entire file system was /lost+found, although commands were still executable if you knew the path to them. I decided that another fsck was a good plan, but this led to further corruption and I terminated it once it started doing scary things like zeroing superblocks and trashing the ext3 journal. Now nothing was exeutable anymore (permission denied) and several things that should be files had become directories.

In conclusion the file system is toast. We're hoping that we might be able to recover some data if we're lucky, but...

No TrackBacks

TrackBack URL: http://growler.woaf.net/cgi-bin/mt/mt-tb.cgi/1

Leave a comment

About this Entry

This page contains a single entry by Growler published on June 6, 2009 2:54 PM.

Find recent content on the main index or look in the archives to find all content.

Categories

Pages

Powered by Movable Type 4.33-en