Multi-tenancy Drizzle

My previous post focused on some of the problems of doing multi-tenant MySQL.

One of the reasons why I started hacking on Drizzle was that the multi-tenancy options for MySQL just weren’t very good (this is also the reason why I run my blog in a VM and not a shared hosting solution).

What you really want is to be able to give your users access to a virtual database server. What you don’t want is to be administering a separate database server for each of your users. What you want are CATALOGs.

A CATALOG is a collection of SCHEMAs (which have TABLEs in them). Each CATALOG is isolated from all the others. Once you connect to a catalog, that’s it. They are entirely separate units. There are no cross-catalog queries or CHANGE CATALOG commands. It is as if each catalog is its own database server.

You can easily imagine a world where there are per-catalog resource limits too (max connections, max temp tables etc).

For the Drizzle7 release, we got in some preliminary support to ensure that the upgrade path would be easy. You’ll notice that all the schemas you create are in the ‘local’ catalog (you can even spot this on the file system in the datadir).

For the next Drizzle milestone, the intent has been to flesh out this support to enable a very elegant multi-tenant solution for Drizzle.

One of the things we worked on a little while ago now is introducing TableIdentifier and SchemaIdentifier objects into Drizzle. Historically (and still in the MySQL codebase) tables would be referenced by a string in the format “database/table_name” (except sometimes when it could be “database\table_name”). There were also various times when this was the name as entered by the user and times when this was converted into a safe form for storing on disk (and comparing to one another).

Everywhere in Drizzle where we have to deal with the path to a table, we call a method on a TableIdentifier for it. There is also a method for getting a series of bytes that can be used as a key in a data structure (e.g. table definition cache). It’s even used to work out what path on the file system to store the data file in.

I would say that the use of TableIdentifier and SchemaIdentifier has prevented many bugs from creeping into the server during development. It took several aborted goes to get something like this into the codebase, just because how the database name, table name and table path strings were being used and abused around the server.

So, with our cleaned up codebase using TableIdentifier and SchemaIdentifier, how hard would it really be to introduce the new level of CATALOG?

So along comes an epic hacking session last weekend. Basically I wanted to see how close I could get in such a short time. I introduced a CatalogIdentifier and then forced the SchemaIdentifier constructor to require one as a parameter….

The great benefit? Every place in the code that I needed to change was a compile error. The overwhelming majority of places I had to fix were shown to me by GCC. I cannot express how wonderful this is.

Anyway, by going through all of these places and fixing up a few things that also needed fixing (including just throwing in a few temporary hacks), I got it working!

I have a tree on launchpad (lp:~stewart/drizzle/multitenant) where you can create multiple catalogs, connect to any of the catalogs you’ve created (or local), create schemas and tables in them. Each catalog is separate. So you can have 1 server with 2 schemas called ‘test’ and 2 tables called ‘t1’.

This is a pretty early development tree… and it comes with some limitations (that will need to be fixed):

  • “local” is the default catalog (as it is in Drizzle7)
  • “local” must exist
  • I_S and DATA_DICTIONARY only show up in local (i.e. SHOW TABLES doesn’t even work yet)
  • HailDB is “local” only
  • testing equals approximately zero
  • currently the only protocol plugin that supports connecting to another catalog is the console plugin

But, as a result of only a weekend of hacking, pretty cool.

Paving the way for easy Database-As-Aa-Service.

8 thoughts on “Multi-tenancy Drizzle

  1. We’re going to work on fixing that testing equals approx. zero bit very very soon >: )

    Expect some DTR cases first, but have some very good ideas:
    1) Stress one catalog while tracking performance on another
    2) Stress multiple catalogs to try to kill things >: )
    3) Stress + create / drop / populate catalogs
    4) data_dictionary query madness whilst stressing the server via catalog stuff
    ….
    awesome, awesome work. And to think that you did this in a weekend…after salmiakki…the elusive Balmer Peak? – http://xkcd.com/323/

  2. why not change database name? for example:

    database.table
    to
    catalog@database.table
    or
    database.table@catalog

    use database
    use database@catalog
    i don’t know if @ is allowed at filesystems (ext4, ntfs) but it’s just a idea… put this at database and not at a new ‘layer’

    today this implementation is done using mysqld_multi (many tcp/ip ports, many my.cnf files…)

    just ideas :) good work it’s nice to see how gcc could help hahahaha!! very good!

  3. Pingback: Last Week in Drizzle « LinuxJedi's /dev/null

  4. Pingback: Drizzle online backup with xtrabackup | Ramblings

  5. I think there may be some plans – of course, it’s best to ask the phpmyadmin people about it :)

    As for catalog support in phpmyadmin… maybe sometimes after I get it fully going in Drizzle :)

  6. Pingback: drizzle.org

  7. Pingback: New dbqp feature – using pre-created datadirs for tests » workcenter220

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.