Multi-tenancy Drizzle

My previous post focused on some of the problems of doing multi-tenant MySQL.

One of the reasons why I started hacking on Drizzle was that the multi-tenancy options for MySQL just weren’t very good (this is also the reason why I run my blog in a VM and not a shared hosting solution).

What you really want is to be able to give your users access to a virtual database server. What you don’t want is to be administering a separate database server for each of your users. What you want are CATALOGs.

A CATALOG is a collection of SCHEMAs (which have TABLEs in them). Each CATALOG is isolated from all the others. Once you connect to a catalog, that’s it. They are entirely separate units. There are no cross-catalog queries or CHANGE CATALOG commands. It is as if each catalog is its own database server.

You can easily imagine a world where there are per-catalog resource limits too (max connections, max temp tables etc).

For the Drizzle7 release, we got in some preliminary support to ensure that the upgrade path would be easy. You’ll notice that all the schemas you create are in the ‘local’ catalog (you can even spot this on the file system in the datadir).

For the next Drizzle milestone, the intent has been to flesh out this support to enable a very elegant multi-tenant solution for Drizzle.

One of the things we worked on a little while ago now is introducing TableIdentifier and SchemaIdentifier objects into Drizzle. Historically (and still in the MySQL codebase) tables would be referenced by a string in the format “database/table_name” (except sometimes when it could be “database\table_name”). There were also various times when this was the name as entered by the user and times when this was converted into a safe form for storing on disk (and comparing to one another).

Everywhere in Drizzle where we have to deal with the path to a table, we call a method on a TableIdentifier for it. There is also a method for getting a series of bytes that can be used as a key in a data structure (e.g. table definition cache). It’s even used to work out what path on the file system to store the data file in.

I would say that the use of TableIdentifier and SchemaIdentifier has prevented many bugs from creeping into the server during development. It took several aborted goes to get something like this into the codebase, just because how the database name, table name and table path strings were being used and abused around the server.

So, with our cleaned up codebase using TableIdentifier and SchemaIdentifier, how hard would it really be to introduce the new level of CATALOG?

So along comes an epic hacking session last weekend. Basically I wanted to see how close I could get in such a short time. I introduced a CatalogIdentifier and then forced the SchemaIdentifier constructor to require one as a parameter….

The great benefit? Every place in the code that I needed to change was a compile error. The overwhelming majority of places I had to fix were shown to me by GCC. I cannot express how wonderful this is.

Anyway, by going through all of these places and fixing up a few things that also needed fixing (including just throwing in a few temporary hacks), I got it working!

I have a tree on launchpad (lp:~stewart/drizzle/multitenant) where you can create multiple catalogs, connect to any of the catalogs you’ve created (or local), create schemas and tables in them. Each catalog is separate. So you can have 1 server with 2 schemas called ‘test’ and 2 tables called ‘t1’.

This is a pretty early development tree… and it comes with some limitations (that will need to be fixed):

  • “local” is the default catalog (as it is in Drizzle7)
  • “local” must exist
  • I_S and DATA_DICTIONARY only show up in local (i.e. SHOW TABLES doesn’t even work yet)
  • HailDB is “local” only
  • testing equals approximately zero
  • currently the only protocol plugin that supports connecting to another catalog is the console plugin

But, as a result of only a weekend of hacking, pretty cool.

Paving the way for easy Database-As-Aa-Service.