New APIs in HailDB

In the current HailDB we have a couple of new API calls that you may like:

  • ib_status_get_all()
    Is very similar to ib_cfg_get_all(). This allows the library to add new status variables without applications having to know about them – because we return a list of what there are. For Drizzle, this means that the DATA_DICTIONARY.HAILDB_STATUS table will automatically have any new status variables we add to HailDB without a single extra line of code having to be written.
  • ib_set_panic_handler()
    Having a shared library call exit() is generally considered impolite. Previously, if HailDB hit corruption (or some other nasty conditions), it could call exit() and you’d never get a chance to display a sensible error message to your user (especially bad in a GUI app where the printed to console error message would be unseen). This call allows an application to specify a callback in the case of HailDB entering such a condition. We’ll still be unable to continue (and we strongly advise that you do in fact exit the process in your callback) but you’re at least now able to (for example) pop up a dialog box saying sorry.
  • ib_trx_set_client_data()
    This call lets you associate a void* with a transaction. HailDB keeps this pointer in its transaction data structure and in some callbacks (e.g. ib_set_trx_is_interrupted_handler(), see below) will pass this pointer back to you for you to use to help make a decision. In InnoDB in MySQL, this is the THD. In Drizzle, it’s the Session.
  • ib_set_trx_is_interrupted_handler()
    In various wait conditions (e.g. waiting for a row lock), HailDB will call the callback you set with this function with the client data (set with ib_trx_set_client_data()) to work out if the transaction has been cancelled. This enables an application to implement something like the MySQL/Drizzle KILL command to cancel a transaction in another thread.
  • ib_get_duplicate_key()
    If you just got a duplicate key error, this function will tell you what key it was. This allows you to implement a nicer error message.
  • ib_get_table_statistics()
    This function gives you access to some basic table statistics that HailDB maintains. This includes an approximate row count, clustered index size, total of secondary indexes as well as a “modified counter” which can give you a rough idea about how out of date these statistics are.

All of these are new to HailDB (and weren’t available in embedded_innodb), many in the new 2.3 development release. You can see usage examples both in the HailDB test suite and (for most of them) in the Drizzle HailDB Storage Engine.

Second Drizzle Beta (and InnoDB update)

We just released the latest Drizzle tarball (2010-10-11 milestone). There are a whole bunch of bug fixes, but there are two things that are interesting from a storage engine point of view:

  • The Innobase plugin is now based on innodb_plugin 1.0.6
  • The embedded_innodb engine is now named HailDB and requires HailDB, it can no longer be built with embedded_innodb.

Those of you following Drizzle fairly closely have probably noticed that we’ve lagged behind in InnoDB versions. I’m actively working on fixing that – both for the innobase plugin and for the HailDB library.

If building the HailDB plugin (which is planned to replace the innobase plugin), you’ll need the latest HailDB release (which as of writing is 2.3.1). We’re making good additions to the HailDB API to enable the storage engine to have the same features as the Innobase plugin.

HailDB 2.0.0 released!

(Reposted from the HailDB Blog. See also the announcement on the Drizzle Blog.)
We’ve made our first HailDB release! We’ve decided to make this a very conservative release. Fixing some minor bugs, getting a lot of compiler warnings fixed and start to make the name change in the source from Embedded InnoDB to HailDB.

Migrating your software to use HailDB is really simple. In fact, for this release, it shouldn’t take more than 5 minutes.

Highlights of this release:

  • A lot of compiler warnings have been fixed.
  • The build system is now pandora-build.
  • some small bugs have been fixed
  • Header file is now haildb.h instead of innodb.h
  • We display “HailDB” instead of “Embedded InnoDB”
  • Library name is libhaildb instead of libinnodb
  • It is probably binary compatible with the last Embedded InnoDB release, but we don’t have explicit tests for that, so YMMV.

Check out the Launchpad page on 2.0.0 and you can download the tarball either from there or right here:

  • haildb-2.0.0.tar.gz
    MD5:  183b81bfe2303aed435cdc8babf11d2b
    SHA1:  065e6a2f2cb2949efd7b8f3ed664bc1ac655cd75

Storage Engine API: write_row, CREATE SELECT and DDL

(this probably applies exactly the same for MySQL and Drizzle… but I’m just speaking about current Drizzle here)

In my current merge request for the embedded-innodb-create-select-transaction-arrgh branch (also see this specific revision), you’ll notice an odd hoop that we have to jump through to make CREATE SELECT statements work with an engine such as InnoDB.

Basically, this is what happens:

  • start transaction
  • start executing SELECT QUERY (well, prepare executing it and fetch a row)
  • create table
  • attempt to insert into table

But… we have to do the DDL statement (i.e. the CREATE TABLE) in its own transaction. This means that the outer transaction (running the SELECT) shouldn’t be able to see it. Except it does. We can create a cursor on this table. However, when we try and do something with it (e.g. ib_cursor_first()) we then get the error message DB_MISSING_HISTORY from InnoDB. With a data dictionary that was REPEATABLE READ, we shouldn’t have this problem. However, we don’t have that.

So? What do we do? If we’re in ::write_row and we get an error and we’re running a SQLCOM_CREATE_TABLE sql_command (yes, we get to poke into current_session->lex->sql_command to find this out) we just magically restart the transaction so that we can (properly) see the created table and write rows to it.

This is not a sane part of the interface; it won’t be an issue for many engines but it is needed here.

This blog post (but not the whole blog) is published under the Creative Commons Attribution-Share Alike License. Attribution is by linking back to this post and mentioning my name (Stewart Smith).

Embedded InnoDB is in the tree!

Well… the start of it :)

I’ve taken the approach of taking tiny incremental steps (and getting review for each step) in implementing a Storage Engine based on the Embedded InnoDB library. What hit lp:drizzle (the trunk branch, for the 2010-04-07 milestone tarball) is only a handful of these small steps, so this engine is not remotely ready for end users.

There should be more of my Embedded InnoDB work hitting the tree in the upcoming days/weeks, enough to get it to a satte that one could describe as functional :)

Storing the table message in Embedded InnoDB

One of the exciting things[1] about working on a storage engine in Drizzle is that you get to manage your own metadata. When the database engine you’re writing the storage engine interface for has a pretty complete data dictionary (e.g. Embedded InnoDB) you could just directly use it. At some point I plan to do this for the embedded_innodb engine for Drizzle so that you could just point Drizzle at an existing Embedded InnoDB database and run SQL queries on it.

The Drizzle table message does have some things in it that aren’t in the InnoDB data dictionary though (e.g. table and column comments). We want to preserve these (and also things like there may be several data types in Drizzle that map to the same data type in InnoDB). Since the Embedded InnoDB API allows us to do things within the DDL transaction (such as insert a row into a table), we store the serialized table message in a table as part of the DDL transaction. This means we can have fully crash safe DDL! There is no way the table definition can get out of sync with what is in InnoDB; we are manipulating them both in the same transaction!

The table structure we’re using is pretty simple. There is two columns: table_name VARCHAR(IB_MAX_TABLE_NAME_LEN) and message BLOB.

The operations we need are:

  • store the table message in doCreateTable (INSERT)
  • rename the table message in doRenameTable (UPDATE the table_name column)
  • delete the table message in doDropTable (DELETE)
  • list tables in a database (SELECT with prefix)
  • get table message (SELECT using key lookup)

All of which are pretty easy to implement using the Embedded InnoDB API.

[1] Maybe I need to get out more….