Drizzle 2009/03/03

The SF MySQL meetup last night [1] about Drizzle [2] was a home run by my limited research [3]. It won’t be a get-out-of-jail-free card for very write-heavy applications but I bet it will do wonders for heavily replicated, heavily federated, read-heavy architectures (you know, normal stuff).

The common conception of Drizzle is a bit off center so I’d like to offer my 2-sentence version. First and foremost, Drizzle is a refactor of MySQL for today’s hardware and today’s architectures. Second, Drizzle is pluggable — everything from the storage engines (which default to InnoDB) to the pluggable PAM authentication and replication — is replaceable or removable.

I intimately understand the pros and cons of global mutexes. Much of my performance testing at OpenDNS has to do with how long our stats pipeline spends waiting for mutex locks. I have to commend Brian Aker and Drizzle’s commitment to their removal. I suspect read-heavy loads will see a major performance increase from this alone (Brian was quoting something like 11-13% of time was spent in the heavily-mutexed authentication code Update: that time is spent in parsing, not in auth, still, the mutexes are bad). The last big mutex is the one that enables MyISAM and it sounded like (it could have been a convincing planned monologue) Brian decided during his talk to remove it, meaning you’d not be able to use MyISAM in Drizzle. That’s fine by me — I only use MyISAM in places where table-level locks are okay.

That time spent in authentication is usually a complete waste. Motivated by the countless MySQL instances running as “root” with no password, authentication is now a removable plugin that can also use PAM auth like your OS can. I hope my operator will let me just take it out.

Most folks in the room didn’t seem to pick up on this one but there was a hand-wavey mention of a “shard bit” in the wire protocol which would open the doors to software (or even hardware!) load-balancers that could understand how you federate your data.

The last thing I found compelling enough to write down is that Drizzle is making heavy use of Google’s Protocol Buffers [4] as part of their query planning. Serialized, they’re used as an alternative wire format and as the replication protocol. Crazy things like replicating some Drizzle tables through a Protocol Buffer proxy into a different database engine become downright easy. For example, keeping your main table in a very indexed Drizzle database while replicating a distilled version into MemcacheDB for fast primary-key lookups of the essential data.

(This reads like a press release, I’m sorry. I wrote all of this down as a cheatsheet for talking to the rest of OpenDNS about when and if this will make sense for parts of our architecture.)