Improving the Storage Engine “API”

I increasingly enclose the API part of “Storage Engine API” in quotes as it does score a rather large number on the API Design Rusty levels (Coined by Rusty Russell). I give it a 15 (out of 18. lower is better) in this case “The obvious use is wrong”.

The ideas is that your handler gets called to write a row (the amazingly named handler::write_row()). It’s passed a buffer which is the row to be stored. An engine that uses the MySQL row format (lets say, ARCHIVE) will simply pack the row and write it out.

Unless there is a TIMESTAMP field with auto set on insert. Up until now (and still now in MySQL) the engine had to check if this was the case and make sure the timestamp field was updated.

To remove this particular bonghit is actually a really small patch, which Jay recently got merged:

~drizzle-developers/drizzle/development : revision 873.1.16

Hopefully somebody does this soon for MySQL as well.

6 thoughts on “Improving the Storage Engine “API”

  1. Archive doesn’t use the MySQL row buffer format since 5.1. Only HEAP, MyISAM, and PBXT use it.

    Things like TIMESTAMP and the AUTOINCREMENT need to be pushed up above where the engine handles them.

  2. Brian: Watch out for autoincrement though. For NDB, the autoincrement sequence obviously has to be managed *in the cluster*, cannot be local to mysqld. Further, if done wrongly, it can become a global lock to the cluster – this actually was the case until some months ago. Now it is implemented such that each mysqld prefetches for itself enough integers from a global pool, managed by the storage engine. Fetching one integer at a time was not efficient at all.

  3. For NDB it’s always depended on configuration, except in some cases where things were prefetched anyway (e.g. bulk insert when you know how many rows are coming).

    Ideally the auto-inc would be set in ndbd and not api as then you don’t have to send it across the wire that extra time.

  4. @Henrik The engine always has the last say on the data stored. As long as it updates the state of the auto_increment for the row[] and Session object, then all is good.

  5. @Brian: Cool, if “data stored” also includes the state of the autoincrement counter (ie the next value, not just the one stored now) which I assume you mean. It just wasn’t obvious to me, and I guess it isn’t obvious that it has to be that way, so I thought pointing it out might be a good idea.

  6. Pingback: Where are they now: MySQL Storage Engines | Ramblings

Leave a Reply