TokuDB

Big news at Percona Live MySQL Conference and Expo this week, Tokutek open sourced TokuDB thus making my previous post Where are they now: MySQL Storage Engines out of date in just a few days.

In this case, I really don’t mind. It’s rather exciting that they’ve gone ahead and done this –  and it’s not just a code drop: https://github.com/Tokutek/ft-engine is where things are at, and recent commits were 2hrs and 18hrs ago which means that this is being maintained. This is certainly a good way to grow a developer community.

While being a MySQL engine is really interesting, the MongoDB integration is certainly something to watch – and may be something quite huge for Tokutek (after all, it’s the database engine parts of Mongo that are the most troublesome – along with the client library license).

A few notes on InnoDB in MySQL 5.7.1

I’ve started poking around the MySQL 5.7.1 source tree (although just from tarball as I don’t see a BZR tree yet). I thought I’d share a few thoughts:

  • InnoDB temporary tables. Not REDO logged. What does this mean? It’s a huge step in removing the dependency on MEMORY and MyISAM engines for temporary tables used in query execution. With InnoDB temporary tables there is no reason for MEMORY engine to continue to exist, there is absolutely no way in which it is better.
  • InnoDB temp tables aren’t insert buffered
    This probably doesn’t really matter as you’re not going to be doing REDO logging for them (plus things are generally short lived)… but it could be a future area for performance improvement
  • The NO_REDO log mode appears to be implemented fairly neatly.
  • Improvements in innodb read only mode. What does this mean? Maybe we can finally get rid of the oddity of compressed read only MyISAM tables on read only media. (on the other hand, CDs and DVDs aren’t exactly a modern form of software distribution).
  • Some of the source code comments have improved.. it’s getting easier to understand InnoDB. I’d still make the argument that if you need source code comments you’re code isn’t clear enough… but any step is an improvement. (that being said, InnoDB was always easier than the server)
  • There is some pretty heavy refactoring of lock0lock.cc – I really need to sit down and poke at it a bunch.
  • The shared tablespace code (innodb system tablespace) has been heavily refactored. This also introduces tablespaces for temporary tables – and it appears to be implemented in the correct way.

I need to look into things a bunch more, and it’ll be really useful to see a bzr tree to better understand some of the changes.

More to come later, but that’s my quick look.

The MEMORY storage engine

I recently wrote about Where are they now: MySQL Storage Engines and The MERGE storage engine: not dead, just resting…. or forgotten. Today, it’s the turn of the MEMORY storage engine – otherwise known as HEAP.

This is yet another piece of the MySQL server that sits largely unmaintained and unloved. The MySQL Manual even claims that it supports encryption… with the caveat of having to use the SQL functions for encryption/decryption rather than in the engine itself (so, basically, it supports encryption about as much as every other engine does).

The only “recent” innovation in the MEMORY engine was the dynamic row patch that ended up making its way into Percona Server (and isn’t enabled by default). This forced me to go and look at the code of the MEMORY engine again and I cannot possibly drink enough in my lifetime to erase the memory.

The MEMORY engine is used by just about everybody as you probably have a SQL query somewhere that uses an in memory temporary table. I can, however, feel the comments being added to this post right now by people who use gdb to set server variables that not a single query in their systems use MEMORY….. (IIRC there have been some patches around that would throw an error rather than create a temporary table)

We had a early version of the dynamic row format patch in Drizzle for a while… and if you turned it on, all sorts of things horrifically broke. It was a remarkably non-trivial amount of work to get that code to work properly and this is largely a testament to the “design” of the MEMORY engine.

While it may be efficient or fast or something (likely on 1990s hardware and workloads), it misses the boat completely on the things that matter today: simultaneous access, MVCC, BLOB/TEXT columns and transactions. Basically, it’s a engine that’s really only useful for a single connection in limited use cases…. and even then, it’s likely a good way to ruin things. MyISAM is better as at least on memory pressure things may be written out to disk sensibly…. and if InnoDB had a “don’t log this table” mode it would beat that absolute pants off it.

It is, again, another part of the MySQL server that’s remarkably hard to pull out and replace with something different/better. Why? Well, I wrote about it before: Refactoring Internal temporary tables (another stab at it). If it was easy, we’d likely have Tokyo Cabinet (via BlitzDB) or similar (some bit of code maintained by other people) doing the same job in Drizzle rather than this large chunk of code that nobody really cares about.

The MERGE storage engine: not dead, just resting…. or forgotten.

Following on from my fun post on Where are they now: MySQL Storage Engines, I thought I’d cover the few storage engines that are really just interfaces to a collection of things. In this post, I’m talking about MERGE.

The MERGE engine was basically a multiplexer down to a number of MyISAM tables. They all had to be the same, there was no parallel query execution and it saw fairly limited use. One of the main benefits was that then you could actually put more rows in a MyISAM table than your “files up to 2/4GB” file system allowed. With the advent of partitioning, this really should have instantly gone away and been replaced by it. It wasn’t.

It is another MySQL feature that exists likely due to customer demand at the time. It’s not a complete solution by any means, PARTITIONING is way more complete and universal…. and much harder to get right inside the MySQL server – which is why MERGE exists. It was easier to write a storage engine that wrapped MyISAM than it was to have any form of partitioning in the server.

One advantage of MERGE tables is it means that you could parallelize myisamchk to repair your broken MyISAM tables after a crash. One step better than no crash safety is at least parallel recovery. The disadvantage being that you’re using MERGE and MyISAM tables.

There is also the great security problem of MRG_MYISAM (the other name for MERGE tables): if you create a MyISAM table t1 and have a user able to access it, if they can create a MERGE table that accesses t1 (say m1) and you then revoke their access to t1, they’ll still be able to access t1 through m1.

MERGE still seems to exist in MySQL 5.6 without even a warning that it’ll go away… which I suspect it will…. we long since got rid of it in Drizzle as, well, what you really want is a query rewrite engine that does views, partitioning etc etc.

Can anyone think of a reason why you should still use MERGE tables in 2013? I can’t.

Detecting if a MySQL server supports partitioning

This morning, this Percona XtraBackup bug came to my attention: https://bugs.launchpad.net/bugs/1170340 – basically, it’s now really quite tricky to determine if a MySQL server you’re connected to supports partitioning or not.

If you’re connected to anything less than MySQL 5.6, you can use have_partitioning variable. But since that’s gone in 5.6, you’re going to get a false negative if you’re connected to 5.6. You could use INFORMATION_SCHEMA.PLUGINS table, but that’s not there in 5.0, so you have some added workarounds to add there too.

A simple version check could be the solution… but what if you compiled the server without partitioning support?

MySQL Community Portraits needs your $$

There’s an Indiegogo fundraising effort for Julian Cash to come to the Percona Live MySQL Conference and Expo this year to take your photo! I’ve thrown in a bunch of money to help make this happen. Why? It’s much much much cheaper than getting any professional photo shoot done, and it’s by the awesome Julian, who makes pretty awesome photos.

You’ll get full resolution images too! Basically, this is the cheapest way you’re going to get this quality of photos done of yourself outside of dating a professional portrait photographer. Since I’m not dating a professional portrait photographer, this is an excellent and affordable way to have some truly awesome and up to date photos that I can use.

I went and bought the Gold level sponsorship as I feel that $250 is a small price to pay (especially considering Julian’s previous excellent work). You can support it with more $ or less $ (even $0), I chose $250 as it meant I could claim, at least for a short time, that I’ve over half funded it :)

Where are they now: MySQL Storage Engines

There was once a big hooplah about the MySQL Storage Engine Architecture and how it was easy to just slot in some other method of storage instead of the provided ones. Over the years I’ve repeatedly mentioned how this wasn’t really the case and that it was remarkably non trivial.

Over the years there have been many storage engines crop up and then disappear. So… where are they now?

  • ISAM
    This became MyISAM…. you know you’ve been around MySQL a long time if you’ve ever had to deal with an ISAM table.
  • Gemini
    This was the first big test of the GPL in court. Basically, you have to obey the GPL (see wikipedia for more info). The code was released as GPL and development stopped. This has been dead since ca 2002.
  • Amira – http://launchpad.net/amira
    Antony first mentioned this in 2008 on his blog. This was a continuation of the Gemini engine, you can actually go over to launchpad and get the code. This was one of the projects to have a transactional storage engine not owned by Oracle after Innobase Oy was acquired by them. It went nowhere special as Netfrastructure was acquired which became Falcon.
  • BDB
    otherwise known as the BerkeleyDB engine. It was seldom used and never gained much of a userbase. It was unceremoniously dropped back in 2006 and both users didn’t really exist.
  • PBXT - http://pbxt.blogspot.com/
    I think we can credit PBXT with at least half of the features and performance improvements to InnoDB since it first emerged back in 2006. It got attention very quickly. Why? Because it was different. It had the very rare ability to outperform InnoDB in some places. You can still find PBXT in MariaDB, but sadly it can be hard to fund development of a MySQL storage engine, especially one as tied to MySQL as PBXT is, and it’s no longer under active development. Closely related was the Blob Streaming project which was way ahead of its time as an AlsoSQL access method. The good news is that the code was released under a BSD license in 2012 (was previously GPL). We even had PBXT in Drizzle for a while.
  • Blob Streaming (PBMS) - http://bpbdev.blogspot.com/
    This project was closely related to (but not depending exclusively on) PBXT. It embedded a HTTP server inside the database and could use it to read and write BLOBs. This was not only fairly cool but way ahead of its time. We owe the existence of both HandlerSocket and the memcached interface to InnoDB to PBMS (it was also an inspiration for the JSON server plugin for Drizzle, to address some of the use cases of the PBMS plugin).
  • Federated
    It’s still there… but is effectively unmaintained and dead. There’s even FederatedX in MariaDB which is an improvement, but still, the MySQL server really doesn’t lend itself kindly to this type of engine… it’s always been an oddity only suitable for very specific tasks.
  • Archive
    Although useful, effectively unmaintained. I kinda don’t want to say dead… but if it went away, I wouldn’t exactly be surprised.
  • CSV
    Currently used to access the log tables in MySQL… and hardly used otherwise. It’s odd that the same code doesn’t deal with SELECT INTO OUTFILE and LOAD DATA INFILE, and I doubt this will ever change. I’d say effectively niche/dead.
  • SolidDB
    Purchased by IBM, abandoned.
  • DB2
    Only ever on System i. Useful for very very few people… but you can still find it around if you’re one of them.
  • Infobright
    OMG it exists! This is probably because they’re largely just using the MySQL server as a way to implement the MySQL network protocol and all of the heavy lifting is done by their own code.
  • Xeround
    I’m quite surprised these guys are still around, as they’re a proprietary storage engine as a service, and initial testing wasn’t entirely promising.
  • TokuDB
    I cannot emphasize how much more interesting TokuDB would be if it were open source. It actually holds some promise… and with their recent work with mongo, perhaps this is a good way forward for them…
  • Maria/Aria
    Another “OMG Oracle just bought Innobase Oy” engine. This was a project to take MyISAM and turn it into a lean, mean, transactional storage engine machine. It’s still not there and I don’t think it ever will be.
  • Falcon
    This was the hot new thing. It came out of Netfrastructure, which MySQL AB acquired in order to help get a transactional storage engine after Innobase Oy was acquired by Oracle. If you’re keeping count, that’s three projects for a transactional storage engine. Falcon was the star though, receiving all the press and publicity (well before it was ready). There are many reasons why Falcon isn’t around today – the chief one probably being that Oracle bought Sun who had bought MySQL and thus a need for an “InnoDB replacement” instantly vanished. There was also immense management pressure for performance to be greater than InnoDB, without any allowance for or focus on correctness…. and this showed. This was quite disappointing as Falcon had a lot of good architectural things going for it.
  • BlitzDB - https://launchpad.net/blitzdb
    I had hoped we’d replace MyISAM with BlitzDB in Drizzle. It was a wrapper around Tokyo Cabinet to the storage engine API in Drizzle. Unfortunately, the ties to MyISAM are incredibly deep (see my recent post on internal temporary tables) and we never quite got there.

I think this is all the notable engines that were aimed at widespread adoption… what ones have I forgotten?

It’s interesting to note that only Archive, CSV, Xeround, TokuDB and Infobright can be gotten anywhere, and the latter two only in their own distribution (one proprietary) and Xeround only as a service.

30 configuration options and counting

While Domas may have rather effictively trolled the discussion with his post on howto configure table/user statistics (which gave me a good chuckle I do have to say), it’s at least incorrect for Percona Server as you have to enable the “userstat” server option :)

That being said, once enabled there are no extra configuration variables to think about. This is a huge advantage over configuring PERFORMANCE_SCHEMA – which has a total of THIRTY configuration options (31 if you include the global enable/disable option).

Some of these thirty odd configuration variables are only going to matter if you’re loading your own plugins, and even then, it’s probably only going to matter if they use the MySQL mutex implementations rather than, say, the standard pthread ones or even other synchronization primitives. It helps that the vast majority of non-InnoDB storage engines are dead. Go on – name one that’s in any form of usage (MyISAM doesn’t count – it’s effectively on death row).

This really makes me want to go and resurrect and finish the perf integration with Drizzle. The operating system provides a whole bunch of performance monitoring tools already, just expose them via SQL and be done with it.

More on the FRM file format (and the minimum maximum number of columns in MySQL)

Over at the work blog, I wrote about what the true maximum number of columns in MySQL is as well as the minimum maximum. Basically, the FRM file is ass and places bizarre arbitrary limits on things due to what can only be seen as “limitations” in 1980s computing and automatically generated interfaces for entering rows on a 80×24 character VT100 terminal.

Full post here: https://www.percona.com/blog/understanding-the-maximum-number-of-columns-in-a-mysql-table/

Drizzle and Google Summer of Code 2013

Those interested in hacking on Drizzle for the Google Summer of Code this year should certainly jump on the mailing list and IRC channel and work out what they may want to do. I (and others) are happy to supervise this year.

Drizzle is a relational database server with an accessible C++ code base that has been involved with GSoC for a number of years now.

Refactoring Internal temporary tables (another stab at it)

A few weekends ago, I started to again look at the code in Drizzle for producing internal temporary tables. Basically, we have a few type of tables:

  • Standard
  • Temporary (from CREATE TEMPORARY TABLE)
  • Temporary (from ALTER TABLE)
  • Internal temporary (to help with query execution)

If you’re lucky enough to be creating one of the first three types, you go through an increasingly lovely pile of code that constructs a nice protobuf message about what the table should look like and hands all responsibility over to the storage engine as to how to do that. The basic idea is that Drizzle gets the heck out of the way and lets the storage engine do its thing. This code path looks rather different than what we inherited from MySQL. For a start, we actually have a StorageEngine object rather than just lumping everything into the handler (which we correctly name a Cursor). However… the final part, the internal temporary table code is a bit closer to what we inherited from MySQL. There is a good reason for that, it’s ass.

For a start, the table::Singular object is still abused by Item_sum_distinct (see the setup() method) as a tuple (a table with no actual table). This is not ideal and just throws a spanner in the works for refactoring a bunch of code.

The second big problem is that create_tmp_table() doesn’t actually use any normal API calls, instead it manually sets up the table::Singular object. This includes setting up the fields for the table::Singular object in a slightly different way depending on which bit of code called create_tmp_table().

The third big problem is that it’s not storage engine agnostic. Instead of using any existing and sensible way to go and create a temporary table by using the storage engine API it instead creates a series of MI_COLUMNDEF structures which as you may be able to guess, are MyISAM specific and internal data structures.

The forth big problem is that if we end up using HEAP (again, like MyISAM, hard coded) we don’t even call the create table method on the engine. The HEAP (or MEMORY engine as it’s now known) is magic in that it can create tables on open()!

All of these issues make it really, really hard to have another engine with the ability to handle internal temporary tables. You may recall that MariaDB does include the ability to use the Aria engine for internal temporary tables. No, they did not refactor any of this code, they just made a copy of the code and put in Aria where MyISAM was along with some #ifdef for the feature.

Over the past several years I’ve tried a few times to tease this code out and start the process of turning it into something that is palatable. Every one of those times I’ve either failed or gotten sufficiently frustrated that I’ve given up.

I now have a new strategy though. After looking at the code for a good few hours a few weekends ago, I think I have an idea of where to start…. (now just for a few more free weekends to implement it).