Where are they now: MySQL Storage Engines

There was once a big hooplah about the MySQL Storage Engine Architecture and how it was easy to just slot in some other method of storage instead of the provided ones. Over the years I’ve repeatedly mentioned how this wasn’t really the case and that it was remarkably non trivial.

Over the years there have been many storage engines crop up and then disappear. So… where are they now?

  • ISAM
    This became MyISAM…. you know you’ve been around MySQL a long time if you’ve ever had to deal with an ISAM table.
  • Gemini
    This was the first big test of the GPL in court. Basically, you have to obey the GPL (see wikipedia for more info). The code was released as GPL and development stopped. This has been dead since ca 2002.
  • Amira – http://launchpad.net/amira
    Antony first mentioned this in 2008 on his blog. This was a continuation of the Gemini engine, you can actually go over to launchpad and get the code. This was one of the projects to have a transactional storage engine not owned by Oracle after Innobase Oy was acquired by them. It went nowhere special as Netfrastructure was acquired which became Falcon.
  • BDB
    otherwise known as the BerkeleyDB engine. It was seldom used and never gained much of a userbase. It was unceremoniously dropped back in 2006 and both users didn’t really exist.
  • PBXT – http://pbxt.blogspot.com/
    I think we can credit PBXT with at least half of the features and performance improvements to InnoDB since it first emerged back in 2006. It got attention very quickly. Why? Because it was different. It had the very rare ability to outperform InnoDB in some places. You can still find PBXT in MariaDB, but sadly it can be hard to fund development of a MySQL storage engine, especially one as tied to MySQL as PBXT is, and it’s no longer under active development. Closely related was the Blob Streaming project which was way ahead of its time as an AlsoSQL access method. The good news is that the code was released under a BSD license in 2012 (was previously GPL). We even had PBXT in Drizzle for a while.
  • Blob Streaming (PBMS) – http://bpbdev.blogspot.com/
    This project was closely related to (but not depending exclusively on) PBXT. It embedded a HTTP server inside the database and could use it to read and write BLOBs. This was not only fairly cool but way ahead of its time. We owe the existence of both HandlerSocket and the memcached interface to InnoDB to PBMS (it was also an inspiration for the JSON server plugin for Drizzle, to address some of the use cases of the PBMS plugin).
  • Federated
    It’s still there… but is effectively unmaintained and dead. There’s even FederatedX in MariaDB which is an improvement, but still, the MySQL server really doesn’t lend itself kindly to this type of engine… it’s always been an oddity only suitable for very specific tasks.
  • Archive
    Although useful, effectively unmaintained. I kinda don’t want to say dead… but if it went away, I wouldn’t exactly be surprised.
  • CSV
    Currently used to access the log tables in MySQL… and hardly used otherwise. It’s odd that the same code doesn’t deal with SELECT INTO OUTFILE and LOAD DATA INFILE, and I doubt this will ever change. I’d say effectively niche/dead.
  • SolidDB
    Purchased by IBM, abandoned.
  • DB2
    Only ever on System i. Useful for very very few people… but you can still find it around if you’re one of them.
  • Infobright
    OMG it exists! This is probably because they’re largely just using the MySQL server as a way to implement the MySQL network protocol and all of the heavy lifting is done by their own code.
  • Xeround
    I’m quite surprised these guys are still around, as they’re a proprietary storage engine as a service, and initial testing wasn’t entirely promising.
  • TokuDB
    I cannot emphasize how much more interesting TokuDB would be if it were open source. It actually holds some promise… and with their recent work with mongo, perhaps this is a good way forward for them…
  • Maria/Aria
    Another “OMG Oracle just bought Innobase Oy” engine. This was a project to take MyISAM and turn it into a lean, mean, transactional storage engine machine. It’s still not there and I don’t think it ever will be.
  • Falcon
    This was the hot new thing. It came out of Netfrastructure, which MySQL AB acquired in order to help get a transactional storage engine after Innobase Oy was acquired by Oracle. If you’re keeping count, that’s three projects for a transactional storage engine. Falcon was the star though, receiving all the press and publicity (well before it was ready). There are many reasons why Falcon isn’t around today – the chief one probably being that Oracle bought Sun who had bought MySQL and thus a need for an “InnoDB replacement” instantly vanished. There was also immense management pressure for performance to be greater than InnoDB, without any allowance for or focus on correctness…. and this showed. This was quite disappointing as Falcon had a lot of good architectural things going for it.
  • BlitzDB – https://launchpad.net/blitzdb
    I had hoped we’d replace MyISAM with BlitzDB in Drizzle. It was a wrapper around Tokyo Cabinet to the storage engine API in Drizzle. Unfortunately, the ties to MyISAM are incredibly deep (see my recent post on internal temporary tables) and we never quite got there.

I think this is all the notable engines that were aimed at widespread adoption… what ones have I forgotten?

It’s interesting to note that only Archive, CSV, Xeround, TokuDB and Infobright can be gotten anywhere, and the latter two only in their own distribution (one proprietary) and Xeround only as a service.

PBMS in Drizzle

Some of you may have noticed that blob streaming has been merged into the main Drizzle tree recently. There are a few hooks inside the Drizzle kernel that PBMS uses, and everything else is just in the plug in.

For those not familiar with PBMS it does two things: provide a place (not in the table) for BLOBs to be stored (locally on disk or even out to S3) and provide a HTTP interface to get and store BLOBs.

This means you can do really neat things such as have your BLOBs replicated, consistent and all those nice databasey things as well as easily access them in a scalable way (everybody knows how to cache HTTP).

This is a great addition to the AlsoSQL arsenal of Drizzle. I’m looking forward to it advancing and being adopted (now much easier that it’s in the main repository)