Why I dislike mySQL (an ops perspective)

| No Comments | No TrackBacks
Much has been argued over why mySQL may or may not suck. Most arguments on the subject focus on SQL standards compliance and perceived performance. Some will argue that mySQL is really easy to install and make work, and, well, hell, "everybody uses it." While I will briefly rant on performance, this post will mostly be about the operational headaches that come from trying to run mySQL as a mission critical database at a real company doing sizable quantities of traffic. I accept that it does have many potential use cases where my issues are irrelevant, but rather unfortunately, in my experience, there are a very large number of people out there who are utterly unaware that they're "doing it wrong." Something to note is that "real world" deployments regularly make use of a mix of table types (often completely inappropriately), and are often running what are considered "out-of-date" versions, or with a poor choice of options (not having per-table tablespaces for example.)
My experience comes from work at several companies who made heavy use of mySQL, and have support contracts. I've also had the misfortune to provide consultancy services to hosting customers while working at a reasonably big name hosting company.

Several of my issues focus around disaster recovery. Disasters come in many flavours, ranging from some toss-pot deciding to power cycle your database server, up to your data centre burning to the ground. Your mission critical data needs to be recoverable in the shortest possible time. It needs to be correct, preferably contain ALL of those pesky transactions that customers have been told were successful, and most definitely it needs to be consistent. Consistency in this context requires that those irritating operations that occurred as part of a single transaction are either all still there, or are all missing. Everyone likes to pretend that nothing bad will ever happen to their data, but it's bound to. At some point you'll have a PSU failure, a breaker trip, motherboard explode, your hosting provider reboot your database server after getting confused about rack locations, or just that you've been daft enough to stick it in a VMware ESX cluster and HA has gone insane.

First off a bit of a rant about myISAM. This rant contains nothing new, it's all well documented. Unfortunately people still haven't got the hint and stopped using it for critical data. As a starter it's really prone to corruption. This is mostly due to writes to tables not being an atomic operation, and not having any way to deal with a write operation that only partially completed (due to power failure or kill -9 or crashing horribly.) Your data is going to get maimed if something bad happens, no matter what you do. If you're lucky you'll just lose recent transactions, but sometimes random chunks of your table will be consumed. But nevermind, because it turns out that write operations of any sort are horribly bad for performance and something best avoided. MyISAM tables rely on taking entire table locks when a complex write operation (most often reuse of space that has been freed by earlier deletes) is in progress. Any selects currently running are allowed to complete, but any that come in after your write now sit waiting for the lock to be released. This makes a combination of long running selects or lots of concurrent selects plus any sort of write extremely bad news. The good news is that setting concurrent_inserts to 2 in your config file will mostly avoid the locking issues, but this wont save you from the data corruption issues. An inescapable problem is that after a crash you MUST run a myisamchk or there's a good chance that things will start behaving in unexpected ways (selects might just hang, some rows will go missing, etc), and on a large table this will take forever.

Now something about Innodb. Until a few versions ago mySQL defaulted to sticking all your Innodb table data into one giant file. Unfortunately lots of people still haven't moved to using a file per table. This is a tad evil as it turns out that free space in Innodb tables isn't reused, and deleting stuff doesn't reduce the size of the file. If you're using a file per table you can rebuild each table that you want to shrink, but if everything is in one file then welcome to a world of pain. This irritating drawback has caused people who should know better to use myISAM tables in their applications. Another headache with Innodb is that by default mySQL doesn't sync writes to the transaction log to disk. This means that on a crash/power failure/etc, while you wont have corrupted tables, you might lose the last N transactions. This may not sound so bad, but it's still a bit of a pain when your app believes them to have succeeded, and means that by default it's not ACID. A quick config change can fix this, but there will be significant performance implications.

There are some additional problems that come with mixing  table types. I've run into problems after a failure where the Innodb tables all lost the same time frame of data (since the last transaction log sync), but the myISAM tables lost a completely different time period. This has left my databases in a horribly inconsistent state.

So, on to backups. You want to make a backup of your mySQL database server. This isn't exactly an unreasonable request, accidents do happen after all.

An obvious approach to backups is to mysqldump your database. Unfortunately if you have any myISAM tables then unless you take a lock on all of your tables while you do this dump it's not going to be consistent: an event that requires inserts to two tables might end up inserting to table A after the dump has finished with it, but inserting to table B before your dump has reached it. This is far from ideal and potentially very unpleasant if you need to restore from said backup. If all your tables are Innodb then you can do the dump in a transaction, which will guarantee consistency (well, as long as no one is going about making DDL changes while your backup runs, but that's a reasonable assumption unless your app is utterly broken.) Taking a lock on all your tables is probably going to lead to an outage, so isn't the best of options. Even without the lock, and accepting the consistency issues, myISAMs concurrent access scheme (discussed earlier) means that on a large table your dump will probably cause issues. A further down side of your dump is that it's not incremental. You're going to get a giant file containing all your data every night.

As an alternative you could do a "hot backup", copying the raw mySQL data files off to your backup server. This is a really bad idea. Updates to the raw data files are NOT atomic, so copying a load of data files out from under a running mySQLd will corrupt tables. If you're lucky it will at least start up with a restore from your "backup", but any myISAM tables being written to when you took your "backup" will be horribly corrupted, and Innodb doesn't like this sort of behaviour too much either. The safe way to do this is either by shutting down mySQL, or taking a lock on everything and flushing tables, before doing your backup. This obviously leads to application issues. As per taking a mysqldump of your tables, this method also fails to take advantage of incremental backups as any table that changes will get backed up in full every backup run.

Finally you could rely on the snapshot feature of your filesystem. This is an extremely dangerous option if your file system doesn't support snapshotting multiple files simultaneously. At present this isn't a particularly appealing option but in the future, with file systems capable of low cost snapshots, it may become the norm. It still suffers from the lack of incremental backup possibilities.

Right, so backups directly off your primary database server are not a great plan! Lets try this native replication thing, and run our backups (and inconveniently resource intensive reporting queries) off a slave. That way we can use any of the above methods without any risk to our actual service. Hell, we may as well throw in some nice slaves for our application to use for read-only purposes (this throws up some interesting problems by itself, but they are in no way unique to mySQL.)

mySQL native replication is a feature regularly used to win the PostgreSQL vs mySQL argument. It seems to have been a major factor in many company's decisions to use mySQL. It's true that PostgreSQL doesn't have built in replication (at least, not yet), but it does have several 3rd party replication solutions, and they all work a hell of a lot better than mySQL's native option. At this point I must admit that I have not used native row based replication. This is because I'm yet to come across a replication set that contains no myISAM tables. Unfortunately myISAM rules out anything but statement based replication (although more recently "hybrid" is an option.)

A massive headache with statement based replication is that a statement run on one server may not produce the same results on a slave. An obvious example is use of RAND(), a less obvious one is due to use of NOW(). An extremely unpleasant case occurs where tables in the master and slave are out of sync (myISAM corruption, or lost Innodb commits, after a crash being a common cause), at which point an INSERT INTO...SELECT... will have different results on different nodes, amplifying the differences (and often leading to unique constraint violations at a later date.)

Inconsistency between nodes is something you really don't want, and avoiding it is a right pain. If you have a crash of your slave you MUST rebuild it from the master rather than getting it back up and replicating. If you have a crash of your master you should either fail over to a slave, or rebuild ALL of your slaves from the master.

Of course it turns out that after a crash there's a good chance replication just wont start again anyway. By default the binary log (basically just a statement log, which is used for replication) is not synced to disk. This means that in a crash of the master it'll probably lose a load of writes. When your master comes back up the slaves will have replicated a load of rows that the master no longer knows exists. The slaves will refuse to restart replication because the binary log position they request from the master doesn't exist anymore. While you can manually set a new bin log position this will leave you in a rather unpleasant state, with slaves having some rows the master doesn't. To add to the fun you also can't guarantee that what's in the binary log for replication after a crash contains everything that has actually been written to the data files. Your best option is to rebuild the slaves. Unexpected loss of a slave can leave you in a similar state thanks to the unpredictable myISAM table corruption that is likely to be visited upon you.

So, rebuilding a slave... a simple task, right? Yes, if you have another slave that you can shut down and take a copy from. If you don't then you're going to need to take a consistent dump from the master. Hey, doesn't this sound like the backup problem from earlier? Yep, that's because it's basically the same issue - you MUST have a consistent snapshot, and, in a mixed table type environment, that means locking everything (and thus probably downtime.)

The general headache farm that comes with native mySQL replication leads a lot of people to try to use drbd to provide a hot-standby capability. This works surprisingly well if you have only Innodb tables, however you need to remember that promoting your secondary after a master exploding is equivalent to dealing with a box that has just been unexpectedly power cycled - you're going to have exciting problems with loss of committed transactions and, most irritatingly, myISAM table corruption. Before your secondary can be started you've got to do a myisamchk, and this can easily take hours if you have large myISAM tables. Your hot-standby isn't.

So, isn't all of this normal? The massive user base, including many massive names, suggests that mySQL is the best of breed. Surely the alternatives (aside from those costing lolmoney) have all these problems and more! Fortunately not. I'll stick to PostgreSQL as that's the alternative that I'm most familiar with. In the PostgreSQL world hot backups are a nice and boring affair, have the possibility of continuous archiving of transaction logs to a remote host, and yet work rather nicely with incremental backup solutions. There's only one table type and it's well behaved in crashes. Replication is a bit of a sore spot, but, despite its complexity, I've had less issues with the nightmare that is Slony than I have with mySQL native replication. Slony even deals with the entire initial replication sync all by itself and without taking your master off-line.

Oh, and mySQL "multi-master" replication IS NOT  MULTI-MASTER REPLICATION. What you are doing is abusing the fact that mySQL replication has NO sanity checks, and doesn't have any way to enforce read-only on slave tables. You've just set up master->slave replication in both directions, and there's nothing to stop a large number of very, very bad things happening so you're going to have inconsistencies between your database servers. Additionally you've probably done it because you think that your app is write heavy and can benefit from having more than one database to write to... GUESS WHAT: You're wrong. All of the statements executed on master A are sent to B, all those executed on B are sent to A. Both database servers have to execute everything. You've just created yourself a recipe for disaster with little to no benefit. Even the documentation says this, stop trusting random blog articles (yes, I know this is a blog article) by random people who do one query per year on their database.

So, what is good about mySQL? It's quite simple and there's no pg_hba.conf. It has nice SQLish commands like "SHOW TABLES", rather than psql's backslash commands. If you want insane single threaded, sequential, select performance on a table with absolutely NO writes then myISAM is the table type for you! On a more serious and useful note, the massive selling point of mySQL is the plugable storage backends. Sure, for normal usage these are one of its headaches - why can't the damn developers pick one storage engine and make it work well, rather than changing their minds every 30 seconds and leaving yet another in a half-arsed state. As a developer, however, you've got a SQL front end and supporting libraries, and can throw together backends that interface with some obscure, home made, storage systems. You're able to make various odd services available through the same front end server, using a common query language that quite a lot of people understand. This is probably the reason that quite a few big names with specialised requirements are making use of mySQL.

Oh, and insert a rant about how shit the mySQL query planner is here :-)

Coming soon, a "what I dislike about PostgreSQL."

No TrackBacks

TrackBack URL: http://growler.woaf.net/cgi-bin/mt/mt-tb.cgi/3

Leave a comment

About this Entry

This page contains a single entry by Growler published on December 31, 2009 11:00 AM.

Find recent content on the main index or look in the archives to find all content.

Categories

Pages

Powered by Movable Type 4.33-en