The Table protobuf message format

If you’ve ever opened up drizzled/message/table.proto in the Drizzle source tree you will have seen what’s in the table message: the structure that describes a database table in Drizzle. Previously I’ve talked about the Table message more generally, giving a fair bit of history of the FRM file and how we’ve replaced it with both the Table protobuf message and an infrastructure inside Drizzle so that Storage Engines own their own metadata.

Yesterday I talked about the Schema protobuf message format in more detail, and this time I’m talking about the Table protobuf message in a similar amount.

The first time we were loading (then only part of) the table definition out of a protobuf message was way back in January 2009 (I blogged about it too). It was an adventure untangling all sorts of things to get to a much nicer place (where we are now). The code in the server is not perfect… I’ll be the first to admit that some of it is rather strange, but that’s mostly all behind the scenes for people interested in the protobuf Table message!

The Table message has several embedded messages in it too. We need to have information on the Storage Engine, Fields and Indexes (and each of those can have other properties). It is much more complex than the simple Schema message.

Let’s have a look at the basic structure of the Table message:

message Table {

    /* *SNIP* (Here goes the definitions for TableType, StorageEngine, Field, Index, ForeignKeyConstrain, TableOptions and TableStats) */

  required string name = 1;
  required TableType type = 5;
  required StorageEngine engine = 2;
  repeated Field field = 3;
  repeated Index indexes = 4;

  repeated ForeignKeyConstraint fk_constraint = 8;
  optional TableOptions options = 9;
  optional TableStats stats = 10;
}

(We’ve skipped the definitions for the embedded messages for now)

This seems all pretty logical; a table has a name, a type, is in a Storage Engine, has Fields, may have Indexes, may have foreign key constraints, it has some options and statistics (the statistics may go away at some point “soon”).

Let’s have a look at the TableType message definition:

  enum TableType {
    STANDARD = 0;
    TEMPORARY = 1;
    INTERNAL = 2;
  }

It’s pretty simple, the table type is either a standard table (what you get from CREATE TABLE), a temporary table (what you get from CREATE TEMPORARY TABLE) or an INTERNAL table (what you get when Drizzle uses a temporary table during query execution).

Next, the StorageEngine message:

  message StorageEngine {

    message EngineOption {
      enum EngineOptionType {
        BOOL = 0;
        INTEGER = 1;
        STRING = 2;
      }

      required string option_name = 1;
      required string option_value = 2;
      required EngineOptionType option_type = 3;
    }

    required string name = 1;
    repeated EngineOption option = 2;
  }

The main part is the “name” member, which is just the name of the storage engine (e.g. “PBXT”,  “INNODB”, “ARCHIVE”). We do however have support specified in the StorageEngine message for engine specific options (in key value form). Expect these to be used more in the near future.

Specifying Fields is probably the most complex part of the table message. The Field message looks like this (with many embedded messages):

message Field {
    required string name = 1;
    required FieldType type = 2;
    optional FieldFormatType format = 3;
    optional FieldOptions options = 4;
    optional FieldConstraints constraints = 5;
    optional NumericFieldOptions numeric_options = 6;
    optional StringFieldOptions string_options = 7;

    optional string comment = 16; /* Reserve 0-15 for frequently accessed attributes */
    optional SetFieldOptions set_options = 17;
    optional TimestampFieldOptions timestamp_options = 18;
}

So… what does this all mean? Well, Fields have a type, they’re stored in a format, there’s options attached to them, there may be constraints as well as field type specific options.

The different field types should be fairly familiar by now:

    enum FieldType {
      DOUBLE = 0;
      VARCHAR = 1;
      BLOB = 2;
      ENUM = 3;
      INTEGER = 4;
      BIGINT = 5;
      DECIMAL = 6;
      DATE = 7;
      TIME = 8;
      TIMESTAMP = 9;
      DATETIME = 10;
    }

We also allow fields in different formats. Currently, these are default, fixed and dynamic. The idea is you can tell the engine (or the engine can tell you) how it’s storing the field. This is currently here as a nicety and the users for this are few and far between.

    enum FieldFormatType {
      DefaultFormat= 0;
      FixedFormat= 1;
      DynamicFormat= 2;
    }

The FieldOptions get interesting though:

    message FieldOptions {
      optional string default_value = 1;
      optional string update_value = 2;
      optional bool default_null = 3 [default = false];
      optional bytes default_bin_value = 4;
    }

You’ll no doubt be intrigued by the existence of both “default_value” and “default_bin_value”. Ordinarily, using a string to contain a textual representation of the default value (e.g. “foo” or “42”) is fine. However, for BLOB columns, you can have defaults that aren’t representable in a text string, you need binary data (e.g. the default value contains ‘\0’).

For TIMESTAMP columns, we continue to support DEFAULT NOW() and the ability to update the timestamp column on UPDATE. How is this represented in the table message? Well… default_value will be “NOW()” and update_value will be “NOW()”. It is intended that in the future it will be possible to have arbitrary SQL expressions for these. This does, of course, require support in the Drizzle server.

The default_null bool should be rather obvious :)

Well… that’s enough for today. Next time: more of the Field message!

One thought on “The Table protobuf message format

  1. Pingback: Writing A Storage Engine for Drizzle, Part 2: CREATE TABLE | Ramblings

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.