Refactoring Internal temporary tables (another stab at it)

A few weekends ago, I started to again look at the code in Drizzle for producing internal temporary tables. Basically, we have a few type of tables:

  • Standard
  • Temporary (from CREATE TEMPORARY TABLE)
  • Temporary (from ALTER TABLE)
  • Internal temporary (to help with query execution)

If you’re lucky enough to be creating one of the first three types, you go through an increasingly lovely pile of code that constructs a nice protobuf message about what the table should look like and hands all responsibility over to the storage engine as to how to do that. The basic idea is that Drizzle gets the heck out of the way and lets the storage engine do its thing. This code path looks rather different than what we inherited from MySQL. For a start, we actually have a StorageEngine object rather than just lumping everything into the handler (which we correctly name a Cursor). However… the final part, the internal temporary table code is a bit closer to what we inherited from MySQL. There is a good reason for that, it’s ass.

For a start, the table::Singular object is still abused by Item_sum_distinct (see the setup() method) as a tuple (a table with no actual table). This is not ideal and just throws a spanner in the works for refactoring a bunch of code.

The second big problem is that create_tmp_table() doesn’t actually use any normal API calls, instead it manually sets up the table::Singular object. This includes setting up the fields for the table::Singular object in a slightly different way depending on which bit of code called create_tmp_table().

The third big problem is that it’s not storage engine agnostic. Instead of using any existing and sensible way to go and create a temporary table by using the storage engine API it instead creates a series of MI_COLUMNDEF structures which as you may be able to guess, are MyISAM specific and internal data structures.

The forth big problem is that if we end up using HEAP (again, like MyISAM, hard coded) we don’t even call the create table method on the engine. The HEAP (or MEMORY engine as it’s now known) is magic in that it can create tables on open()!

All of these issues make it really, really hard to have another engine with the ability to handle internal temporary tables. You may recall that MariaDB does include the ability to use the Aria engine for internal temporary tables. No, they did not refactor any of this code, they just made a copy of the code and put in Aria where MyISAM was along with some #ifdef for the feature.

Over the past several years I’ve tried a few times to tease this code out and start the process of turning it into something that is palatable. Every one of those times I’ve either failed or gotten sufficiently frustrated that I’ve given up.

I now have a new strategy though. After looking at the code for a good few hours a few weekends ago, I think I have an idea of where to start…. (now just for a few more free weekends to implement it).

12 thoughts on “Refactoring Internal temporary tables (another stab at it)

  1. “You may recall that MariaDB does include the ability to use the Aria engine for internal temporary tables. No, they did not refactor any of this code, they just made a copy of the code and put in Aria where MyISAM was along with some #ifdef for the feature.”

    I wonder why?

  2. @Laurynas, because MariaDB needs to be a drop-in replacement. That means, we try to avoid adding incompatibilities.

    create_tmp_table() does not just accept a list of field types. Instead, it walks query execution structures, and determines the types itself.

    The process depends on what the temporary table is created for – there is code there that check if this is a temporary table created for processing a GROUP BY, or ORDER BY, or something else. The process is very poorly defined.

    This is the reason why the idea of “let’s rewrite create_tmp_table() to be decent” was discussed many times, but never implemented.

    I think, Stewart has a much better chance here, because he doesn’t have to keep backward compatibility with older MySQL versions. Besides that, Drizzle has fewer datatypes, which should make data type inferences simpler.

  3. Thanks Sergei, I suspected there must be reasons for that.

  4. Pingback: Where are they now: MySQL Storage Engines | Ramblings

  5. Pingback: The MEMORY storage engine | Ramblings

  6. Pingback: MySQL 5.7: InnoDB Intrinsic Tables | MySQL Server Blog

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.