OpenDNS MySQL abuses
Fresh off the beginning of Kellan’s series on Flickr’s catalog of MySQL abuses, I thought it would be good to document the ways OpenDNS Stats uses and abuses MySQL. At Velocity last year I glossed over some MySQL stuff in favor of spending more time talking about std::bad_alloc
and fake map/reduce. OpenDNS Stats uses MySQL in three ways, ranging from normal to awesomely stupid.
The most normal case is a single InnoDB database tracking total DNS requests for each network using stats. The machine is modestly beefy so the dataset can fit in memory. INSERT
s come containing up to 100,000 sets of values and uses ON DUPLICATE KEY UPDATE
to get +=
behavior. Each of these INSERT
s contains values that are pre-sorted in PRIMARY KEY
order to avoid deadlock: “Access your tables and rows in a fixed order. Then transactions form well-defined queues and do not deadlock.”
The second case is normal but federated. Every domain name encountered is hashed and stored using the packed 20-byte SHA1 as the PRIMARY KEY
. The leading bits of the SHA1 are used to determine which master stores that domain.
The awesomely stupid case is storing top domains data. There’s a lookup table that maps networks to their shard, on which each network has its own MyISAM tables. The tables are MyISAM because of the way InnoDB handles data dictionaries. Because the tables are MyISAM, DELETE
s are evil: MyISAM can’t INSERT
without extra locks if tables have holes. To delete old data, a new table is swapped into place using RENAME TABLE
and all data that is not being deleted is INSERT
ed into the new table. To cope with so many tables, the key_buffer
is set to about 1/3 of memory to leave more than usual for the filesystem cache and the table_cache
is set to 300,000. This last bit requires mysqld_safe
to be patched to run ulimit -n 600000
before mysqld
is started.
Data is stored by day and domain hash with an integer field for each hour of the day (in UTC). Two rows can then be used to build a day as observed in any timezone. The domain hashes are converted to domain names at read time, accelerated by tons (not like Facebook but still a lot) of memory devoted to memcached
.
Nothing we do is terribly groundbreaking but it reinforces Kellan’s point that MySQL is still a pretty heavy hammer.