My previous post focused on some of the problems of doing multi-tenant MySQL.
One of the reasons why I started hacking on Drizzle was that the multi-tenancy options for MySQL just weren’t very good (this is also the reason why I run my blog in a VM and not a shared hosting solution).
What you really want is to be able to give your users access to a virtual database server. What you don’t want is to be administering a separate database server for each of your users. What you want are CATALOGs.
A CATALOG is a collection of SCHEMAs (which have TABLEs in them). Each CATALOG is isolated from all the others. Once you connect to a catalog, that’s it. They are entirely separate units. There are no cross-catalog queries or CHANGE CATALOG commands. It is as if each catalog is its own database server.
You can easily imagine a world where there are per-catalog resource limits too (max connections, max temp tables etc).
For the Drizzle7 release, we got in some preliminary support to ensure that the upgrade path would be easy. You’ll notice that all the schemas you create are in the ‘local’ catalog (you can even spot this on the file system in the datadir).
For the next Drizzle milestone, the intent has been to flesh out this support to enable a very elegant multi-tenant solution for Drizzle.
One of the things we worked on a little while ago now is introducing TableIdentifier and SchemaIdentifier objects into Drizzle. Historically (and still in the MySQL codebase) tables would be referenced by a string in the format “database/table_name” (except sometimes when it could be “database\table_name”). There were also various times when this was the name as entered by the user and times when this was converted into a safe form for storing on disk (and comparing to one another).
Everywhere in Drizzle where we have to deal with the path to a table, we call a method on a TableIdentifier for it. There is also a method for getting a series of bytes that can be used as a key in a data structure (e.g. table definition cache). It’s even used to work out what path on the file system to store the data file in.
I would say that the use of TableIdentifier and SchemaIdentifier has prevented many bugs from creeping into the server during development. It took several aborted goes to get something like this into the codebase, just because how the database name, table name and table path strings were being used and abused around the server.
So, with our cleaned up codebase using TableIdentifier and SchemaIdentifier, how hard would it really be to introduce the new level of CATALOG?
So along comes an epic hacking session last weekend. Basically I wanted to see how close I could get in such a short time. I introduced a CatalogIdentifier and then forced the SchemaIdentifier constructor to require one as a parameter….
The great benefit? Every place in the code that I needed to change was a compile error. The overwhelming majority of places I had to fix were shown to me by GCC. I cannot express how wonderful this is.
Anyway, by going through all of these places and fixing up a few things that also needed fixing (including just throwing in a few temporary hacks), I got it working!
I have a tree on launchpad (lp:~stewart/drizzle/multitenant) where you can create multiple catalogs, connect to any of the catalogs you’ve created (or local), create schemas and tables in them. Each catalog is separate. So you can have 1 server with 2 schemas called ‘test’ and 2 tables called ‘t1′.
This is a pretty early development tree… and it comes with some limitations (that will need to be fixed):
- “local” is the default catalog (as it is in Drizzle7)
- “local” must exist
- I_S and DATA_DICTIONARY only show up in local (i.e. SHOW TABLES doesn’t even work yet)
- HailDB is “local” only
- testing equals approximately zero
- currently the only protocol plugin that supports connecting to another catalog is the console plugin
But, as a result of only a weekend of hacking, pretty cool.
Paving the way for easy Database-As-Aa-Service.
Tags: CATALOG, drizzle, hacking, hosting, multitenancy, multitenant, SaaS
We’re going to work on fixing that testing equals approx. zero bit very very soon >: )
Expect some DTR cases first, but have some very good ideas:
1) Stress one catalog while tracking performance on another
2) Stress multiple catalogs to try to kill things >: )
3) Stress + create / drop / populate catalogs
4) data_dictionary query madness whilst stressing the server via catalog stuff
….
awesome, awesome work. And to think that you did this in a weekend…after salmiakki…the elusive Balmer Peak? – http://xkcd.com/323/
why not change database name? for example:
database.table
to
catalog@database.table
or
database.table@catalog
use database
use database@catalog
i don’t know if @ is allowed at filesystems (ext4, ntfs) but it’s just a idea… put this at database and not at a new ‘layer’
today this implementation is done using mysqld_multi (many tcp/ip ports, many my.cnf files…)
just ideas :) good work it’s nice to see how gcc could help hahahaha!! very good!
[...] Stewart has been a real busy guy this last week, another project he has been working on is getting catalogs support working with more than one catalog. For those not familiar with catalogs they are a way of totally isolating one user’s databases from another, similar to having multiple installations of Drizzle in one box but all running from one daemon. In the GA release a lot of the framework already existed for catalogs and everything in it runs from a catalog called ‘local’. More information on the progress Stewart has made can be found on his blog post. [...]
[...] Ramblings Ramblings which occasionally resemble reality. This is the blog of Stewart Smith. « Multi-tenancy Drizzle [...]
I wonder if phpmyadmin will support this feature or stay mysql oriented. Anybody knows?
I think there may be some plans – of course, it’s best to ask the phpmyadmin people about it :)
As for catalog support in phpmyadmin… maybe sometimes after I get it fully going in Drizzle :)
[...] you may have read here. Lots of cool things are incubating though – improvements to replication, catalogs (native multi-tenancy support? what?), and other cool bits (You can also read more here. Henrik [...]
[...] For Drizzle, we’ve created some tables (via the randgen’s data generator if you are curious), saved a copy of the datadir, and then created a test case that uses said datadir for the test server. The test executes some simple SQL queries to make sure we can read the tables properly. This way, if we ever do something to either the server or .dfe format (data format exchange – had a most enlightening conversation with the team about this format’s history at the MySQL UC), we’ll have a broken test that cries about it. From there, we’ll know we have to take some action. The always-amazing Stewart Smith has also created some foreign key backwards compatibility tests, which I *believe* marks further progress towards the magical goodness that is catalogs! [...]