Oil spills

| No Comments | No TrackBacks
There's been a fair bit of frothing in the press about the Deepwater Horizon oil rig exploding, mostly focusing on the escape of oil. I noticed a BBC News article today which has some rather suspect claims. I'm sure they're not the only ones, but even basic fact checking raises a few questions.

Moving jobs (again)

| No Comments | No TrackBacks
Yes, it's that time of year again, I'm defecting.

Since leaving ADVFN I've been chasing the money and moving increasingly towards pure ops roles. I don't enjoy that sort of work. I don't like supervising releases, I dislike doing deployments, fiddling with config files is tedious, and being called at 2am sucks. The only bits I do enjoy are developing useful tools, and working out what chain of events led to an complex mass of interconnected apps falling over. When there's a limited number of tools that need writing, and a comparatively simple app to support (and with me having no access to the source), it's just not challenging enough to be interesting. So, erm, aren't I supposed to be senior engineering staff? Why am I doing support work? I get the feeling that I'm employed just in case something really bad happens, and the rest of the work is just there to fill the time when things are working.

Fortunately I've been offered the chance to escape into a development role. Sure it's going to mostly be PHP, and yes it's a pay-cut, but I'm actually hoping that I might finally be making a good employment choice. That'll be a first.

I start at the new place on the 12th April, however I've actually only got 3 working weeks left as I'm off skiing for 3 weeks before then.

New MythTV setup

| No Comments | No TrackBacks
I decided it'd become time to replace my current MythTV setup. The hardware had irritated me for some time (noisy, and required irqpoll to boot), and the software was now some versions behind and seemed to have mangled its database. So it was time to purchase some new toys, and lose multiple hours to the headaches of configuring them. It wasn't as easy as expected, and there were some exciting pitfalls that hopefully this will help others avoid.

Yes Phil, that means this is going to be boring.

Why I dislike mySQL (an ops perspective)

| No Comments | No TrackBacks
Much has been argued over why mySQL may or may not suck. Most arguments on the subject focus on SQL standards compliance and perceived performance. Some will argue that mySQL is really easy to install and make work, and, well, hell, "everybody uses it." While I will briefly rant on performance, this post will mostly be about the operational headaches that come from trying to run mySQL as a mission critical database at a real company doing sizable quantities of traffic. I accept that it does have many potential use cases where my issues are irrelevant, but rather unfortunately, in my experience, there are a very large number of people out there who are utterly unaware that they're "doing it wrong." Something to note is that "real world" deployments regularly make use of a mix of table types (often completely inappropriately), and are often running what are considered "out-of-date" versions, or with a poor choice of options (not having per-table tablespaces for example.)

Why I can't deploy IPv6

| No Comments | No TrackBacks
I'm one of those people who'll use any excuse to play with the latest or most interesting technology. It should therefore come as no surprise that I'm keen to see IPv6 deployed both at home and at work. Sadly I've come to the conclusion that it's just going to upset my customers.

Home is the simplest problem to explain. While my ISP (Enta) are able to support native v6, I own a Draytek 2820 which lacks v6 support. Draytek inform me that they have no interest in implementing support. At $0rkplace our office Internet access provider lacks any interest in providing v6. These two problems mean that I have to resort to tunnelling to have any hope of testing any v6 services that I deploy. Not a huge problem, but at least a little irritating.

At $0rk we're, in theory, in a great situation. We have complete control over our shared web hosting environment, control over the load balancers and firewalls of our enterprise customers, and run DNS for almost all customers. This ought to allow us to make almost any customers' site/service available over v6 without them having to know or care. Unfortunately this isn't the case.

The big problem I have is providers with partial v6 tables and consumers with broken DSL modems. If I were to add an AAAA record for our main web site then hosts that support v6 will prefer that over the A record and attempt to make a v6 connection. If that host is on a network that doesn't have a route to our v6 prefix then they'll hang around for a timeout before attempting a v4 connection using the A record. This timeout is long enough to easily be noticeable, and quite unpleasant, to the user. Even if this problem is limited to a small number of end-users I'm still in trouble: how do I explain to a customer that their web site is "slow" for a user because I've made it available over IPv6? They'll just want me to disable this evil IPv6 thing.

Of course, this problem doesn't just impact the content provider. A v6 enabled user will find that a site that has AAAA records but that lacks v6 connectivity (due to either content or access provider taking only a partial table) to be horribly slow. This encourages the user either not to use the site, or makes them disable IPv6 support. Not ideal.

Providers who are experimenting with providing services over IPv6, but who lack a full table (or a working network), are making IPv6 deployment unpopular from an end-user perspective. A provider with a broken network encourages end users to blame IPv6 and disable support in their OS. Additionally it discourages content providers from making services dual stack. Until this problem goes away (which it won't) I cannot justify making our services (and our customers' services) dual stack. Yeah, sure, I can do a Google and have new names for my v6 services but who the hell's going to make use of them? Why would anyone make the effort to use http://ipv6.0rkplace/ rather than http://www.0rkplace/?

I has a blog!

| No Comments | No TrackBacks
I've decided that it's time to fail to have a blog again. Despite my intention to actually make more than three posts this time round, I expect I'll get fed up and kill it off again in a couple of months.

I installed Movable Type on dog a week or so ago and prepared a nice long boring entry about how much I hate BT. Dog responded by crashing and burning, completely nuking the file system. We've had to reinstall on new drives, and the last "backups" are actually the drives from the previous incarnation of dog which was replaced with a Dell 1950 in December 2007.

So far we're not really sure what happened to cause the data loss. Due to fail on our part we'd not enabled nproc ulimits, and this allowed a user to (unintentionally) trigger an uncontrollable fork bomb that floored the server. I decided to get the datacentre to power cycle the server, we've needed to do a kernel upgrade for ages anyway, and it seemed to be the quickest way to get dog back. It failed to come back after the power cycle. I headed to the datacentre to take a look, expecting it to be missing the firmware-bnx package since the upgrade to etch and a half (damn you Debian.) It was sitting at a fsck failure. Running a fsck by hand there were a couple of fixable errors, which I wasn't too worried about considering the hard reboot. Dog came back up and looked fine so I headed home.

Sadly all was not fine. About an hour later there was an oooops somewhere in the ext3 code, and the file system became unreadable. We requested another power cycle, and this time dog came back up without additional human intervention. While I was able to log in there were a lot of ext3 errors in dmesg and the root file system had been automagically remounted read-only. Not a happy file system. Suddenly ls decided that the only file on the entire file system was /lost+found, although commands were still executable if you knew the path to them. I decided that another fsck was a good plan, but this led to further corruption and I terminated it once it started doing scary things like zeroing superblocks and trashing the ext3 journal. Now nothing was exeutable anymore (permission denied) and several things that should be files had become directories.

In conclusion the file system is toast. We're hoping that we might be able to recover some data if we're lucky, but...