MySQL removes the FRM (7 years after Drizzle did)

The new MySQL 8.0.0 milestone release that was recently announced brings something that has been a looooong time coming: the removal of the FRM file. I was the one who implemented this in Drizzle way back in 2009 (July 28th 2009 according to Brian)- and I may have had a flashback to removing the tentacles of the FRM when reading the MySQL 8.0.0 announcement.

As an idea for how long this has been on the cards, I’ll quote Brian from when we removed it in Drizzle:

We have been talking about getting rid of FRM since around 2003. I remember a drive up to northern Finland with Kaj Arnö, where we spent an hour talking about this. I, David, and MontyW have talked about this for years.

Soo… it was a known problem for at least thirteen years. One of the issues removing it was how pervasive all of the FRM related things were. I shudder at the mention of “pack_flag” and Jay Pipes probably does too.

At the time, we tried a couple of approaches as to how things should look. Our philosophy with Drizzle was that it should get out of the way at let the storage engines be the storage engines and not try to second guess them or keep track of things behind their back. I still think that was the correct architectural approach: the role of Drizzle was to put SQL on top of a storage engine, not to also be one itself.

Looking at the MySQL code, there’s one giant commit 31350e8ab15179acab5197fa29d12686b1efd6ef. I do mean giant too, the diffstat is amazing:

 786 files changed, 58471 insertions(+), 25586 deletions(-)

How anyone even remotely did code review on that I have absolutely no idea. I know the only way I could get it to work in Drizzle was to do it incrementally, a series of patches that gradually chiseled out what needed to be taken out so I could put it an API and the protobuf code.

Oh, and in case you’re wondering:

- uint offset,pack_flag;
+ uint offset;

Thank goodness. Now, you may not appreciate that as much as I might, but pack_flag was not the height of design, it was… pretty much a catchalll for some kind of data about a field that wasn’t something that already had a field in the FRM. So it may include information on if the field could be null or not, if it’s decimal, how many bytes an integer takes, that it’s a number and how many oh, just don’t ask.

Also gone is the weird interval_id and a whole bunch of limitations because of the FRM format, including one that I either just discovered or didn’t remember: if you used all 256 characters in an enum, you couldn’t create the table as MySQL would pick either a comma or an unused character to be the separator in the FRM!?!

Also changed is how the MySQL server handles default values. For those not aware, the FRM file contains a static copy of the row containing default values. This means the default values are computed once on table creation and never again (there’s a bunch of work arounds for things like AUTO_INCREMENT and DEFAULT NOW()). The new sql/ is where this is done now.

For now at least, table metadata is also written to a file that appears to be JSON format. It’s interesting that a SQL database server is using a schemaless file format to describe schema. It appears that these files exist only for disaster recovery or perhaps portable tablespaces. As such, I’m not entirely convinced they’re needed…. it’s just a thing to get out of sync with what the storage engine thinks and causes extra IO on DDL (as well as forcing the issue that you can’t have MVCC into the data dictionary itself).

What will be interesting is to see the lifting of these various limitations and how MariaDB will cope with that. Basically, unless they switch, we’re going to see some interesting divergence in what you can do in either database.

There’s certainly differences in how MySQL removed the FRM file to the way we did it in Drizzle. Hopefully some of the ideas we had were helpful in coming up with this different approach, as well as an extra seven years of in-production use.

At some point I’ll write something up as to the fate of Drizzle and a bit of a post-mortem, I think I may have finally worked out what I want to say…. but that is a post for another day.

9 thoughts on “MySQL removes the FRM (7 years after Drizzle did)

  1. Hi Stewart,

    You are correct in that the JSON file is designed for use only in recovery and import/export purposes. The single source of truth lies in the native dictionary in InnoDB :)

    We call it SDI (Serialized Dictionary Information). InnoDB tables contain a copy inside the .ibd file, but there will also be one .SDI per schema.

    Btw – broken link on the github commit. URL is repeated twice.

    – Morgan

  2. As far as MariaDB is concerned, we’ve removed most FRM limitations in 10.0, three years ago. New data are stored in FRM in the format (so, no fixed bytes at known offsets, that is, easily extendable), that’s how we store new GIS information, table UUID, arbitrary (not known in advance!) table/column/index attributes, etc.

    Also, FRMs are no longer needed since 10.0, they’re optional, and that depends on the engine. An engine without a native data dictionary can continue store table definitions in FRM files, engines with a data dictionary can opt to using FRMs as a *metadata cache* (automatically invalidated and updated as needed) or avoid FRMs altogether.

    DEFAULT values can be arbitrary expressions since 10.2.

  3. Hi Stewart,

    I know it looks like we did it in one giant commit, but that’s not how it really happened. We had a few hundred commits by several people over the course of a couple of years. In the end, the commit history became so confusing (mostly due to tons of merging) that we decided to squash all those commits into one. While this was the single largest piece, we have done quite a bit of additional work after that.

  4. I figured it had to be developed as such. One thing that hasn’t really caught on in the MySQL world is “git rebase”, which is excellent for such large refactoring efforts. It also gets you a linear commit history that’s much easier to bisect if/when something goes wrong and you need to debug.

    We used it all the time when working on MySQL Cluster (although it was actually using quilt to maintain a series of patches, but same effect), and we use it all the time for the kernel and firmware.

    I should write a post on “rebase > merge” at some point in the future.

  5. I had thought MariaDB had done some work here, and I have to admit I hadn’t really looked too closely at how it was modified to deal with a bunch of the new features.

  6. Morgan – thanks for the headsup, link fixed.

    I think there’s a benefit for not having a separate file at all, and having the engine be responsible for either having a separate file or having the data stored somewhere inside the engine.

  7. Our dictionary branch was branched off well before we switched from bzr to git.
    Git rebase is indeed quite nice. It is one of the reasons our commit history looks better now than it used to (imo).

  8. Pingback: Log Buffer #489: A Carnival of the Vanities for DBAs | Official Pythian® Blog – Cloud Data Architect

Leave a Reply