MySQL modularity, are we there yet?

MySQL is now over four times the size than it was with MySQL 3.23. This has not come in the shape of plugins.

Have we improved modularity over time? I decided to take LoC count for plugins and storage engines (in the case of Drizzle, memory, myisam and innobase are storage engines and everything else comes under plugin). I’ve excluded NDB from these numbers as it is rather massive and is pretty much still a separate thing.

Version Total LoC Plugin LoC Storage Engines LoC Remaining (kernel)
MySQL 3.23.58 371,987 0 (0%) 176,276 195,711 (52% kernel)
MySQL 5.1.68 721,331 228 237,124 483,979 (67% kernel)
MySQL 5.5.30 858,441 2,706 171,009 684,726 (79% kernel)
MySQL 5.6.10 1,049,344 29,122 236,067 784,155 (74% kernel)
MariaDB 5.5 1,142,118 11,781 304,015 826,322 (72% kernel)
Drizzle trunk 334,810 31,150 130,727 172,933 (51% kernel)

I’ve used the non-plugin and non-storage engine code size to be the database “kernel” – i.e. the core of the database server.

What I find really interesting here is that yes, the amount of code that is to some degree modular has increased. The amount of code that is a MySQL plugin is still very small compared to the server size

Drizzle is 20-25% of the size of a modern MySQL or MariaDB server and for many applications does largely or exactly the same thing.

Other MySQL branch code sizes

Continuing on from my previous posts, MySQL code size over releases and MariaDB code size I’ve decided to also look into some other code branches. I’ve used the same methodology as my previous few posts: sloccount for C and C++ code only.

There are also other branches around in pretty widespread use (if only within a single company). I grabbed the Google, Facebook and Twitter patches and examined them too, along with Percona Server 5.1 and 5.5.

Codebase LoC (C, C++) +/- from MySQL
Google v4 patch 5.0.37 970,110 +26,378 (from MySQL 5.0.37)
MySQL@Facebook 1,087,715 +15,768 (from MySQL 5.1.52)
Twitter 5.5.29.t10 1,192,718 +3,624
Percona Server 5.1 trunk 1,066,418 +14,878 (from MySQL 5.1.66)
Percona Server 5.5 trunk 1,208,577 +19,483 (from MySQL 5.5.29) +142,159 (from PS 5.1)
Drizzle trunk 334,810

The Google patch has always had a reputation of being large, and with an extra 26kLOC of code, it certainly is the biggest of any of the more current branches – and that’s actually a surprise to me that it adds this much code.

The Facebook and Percona Server 5.1 branches are amazingly similar in how much extra code they add, and they’re not carbon copies of each other. The Twitter patch quite notable for how little extra code it adds.

For giggles, I included Drizzle – which is (even with all the plugins) less than a third of the size of MySQL 5.1.

It’s clear that the Percona Server and Facebook patches introduce much less code than MariaDB does, which does go with the general wisdom of them being closer to Oracle MySQL than MariaDB is.

If we look at Percona Server, we see that with Percona Server 5.5 there is indeed a bunch more code than was in Percona Server 5.1, with roughly 5,000 more lines of code than we’d expect from a simple port from MySQL 5.1 to MySQL 5.5. This feels about right, we’ve added new things to Percona Server 5.5 that weren’t in Percona Server 5.1.

Fun with Coverity found bugs (Episode 1)

Taking the inspiration of  great series of blog posts “Fun with Bugs” (and not http://funwithbugs.com/ which is about both caring for and eating bugs), and since I recently went and run Coverity against Drizzle, I thought I’d have a small series of posts on bugs that it has found (and I’ve fixed).

An idea that has been pervasive in the Drizzle project (and one that I rather subscribe to) is that there is two types of correct: correct and obviously correct. Being obviously correct is much, much better than merely being correct.

The first category of problems that Coverity found was kind of interesting, there was a warning that data_file_name and index_file_name in class ha_myisam weren’t initialized in the ha_myisam constructor nor in any function that it calls. It turns out that this was basically because the code wasn’t exactly optimal, and these variables were used kind of oddly. In fact, in writing this blog post I went back and found that there’s a bunch of extra dead code and these should just be removed, along with the code that “used” them.

The historical use for data_file_name and index_file_name were that (in MySQL) you could specify different paths for MyISAM data and index files, so that the FRM ended up in the server datadir, the data file ended up some other place and the index file was off behind the sofa. Since MyISAM is used only for temporary tables in Drizzle, this is entirely not needed.

Another place where a similar bug was found by Coverity was in the SQLExecutor class of the json_server plugin. The _err variable wasn’t initialized in the constructor. After some careful auditing, I think this was actually a false positive as it was set to something before being used, but it was pretty simple to prevent future bugs by initializing it.

Two instances of the same warning, one just found a bunch of code to delete (rather useful) and the other is rather minor but may help someone in the future.

Coming up next: total embarrassment bugs.

Finding out What’s Next at BarCampMel 2012 with Drizzle, SQL, JavaScript and a web browser

Just for the pure insane fun of it, I accepted the challenge of “what can you do with the text format of the schedule?” for BarCampMel. I’m a database guy, so I wanted to load it into a database (which would be Drizzle), and I wanted it to be easy to keep it up to date (this is an unconference after all).

So… the text file itself isn’t in any standard format, so I’d have to parse it. I’m lazy and didn’t want to leave the comfort of the database. Luckily, inside Drizzle, we have a js plugin that lets you execute arbitrary JavaScript. Parsing solved. I needed to get the program and luckily we have the http_functions plugin that uses libcurl to allow us to perform HTTP GET requests. I also wanted it in a table so I could query it when not online, so I needed to load the data. Luckily, in Drizzle we have the built in EXECUTE functionality, so I could just use the JavaScript to parse the response from the HTTP GET request and construct SQL to load the data into a table to then query.

So, grab your Drizzle server with “plugin-add=js” and “plugin-add=http_functions” in the config file or as options to drizzled (prefixed with –) and….

This simple one liner pulls the current schedule and puts it into a table called ‘schedule’:

SELECT EXECUTE(JS("function sql_quote(s) {return s ? '\"'+ s.replace('\"', '\\\"') + '\"' : 'NULL'} function DrizzleDateString(d) { function pad(n) { return n<10 ? '0'+n : n } return d.getFullYear()+'-'+pad(d.getMonth()+1)+'-'+pad(d.getDate())+' '+pad(d.getHours())+':'+pad(d.getMinutes())+':'+pad(d.getSeconds()) } var sql = 'COMMIT;CREATE TABLE IF NOT EXISTS schedule (start_time datetime, stage varchar(1000), mr2 varchar(1000), mr1 varchar(1000), duration int); begin; delete from schedule;' ; var time= new Date;var input= arguments[0].split(\"\\n\"); var entry = new Array(); var stage, mr2, mr1; for(var i=0; i < input.length; i++) { var p= input[i].match('^(.*?) (.*)$'); if(p) {if(p[1]=='Time') { time=new Date(Date.parse(p[2]));} if(p[1]=='Duration') { sql+='INSERT INTO schedule (start_time,stage,mr2,mr1,duration) VALUES (\"' + DrizzleDateString(time) + '\", ' + sql_quote(stage) + ', ' + sql_quote(mr2) + ',' + sql_quote(mr1) + ',' + p[2] + '); '; time= new Date(time.getTime()+p[2]*60*1000); stage= mr2= mr1= ''; } if(p[1]=='stage') {stage=p[2]} if (p[1]=='mr2') {mr2=p[2]} if (p[1]=='mr1') {mr1=p[2]} }}; sql+='COMMIT;'; sql", (select http_get('https://dl.dropbox.com/s/01yh7ji7pswjwwk/live-schedule.txt?dl=1'))));

Which you can then find out “what’s on now and coming up” with this query:

select * from schedule where start_time > DATE_ADD(now(), INTERVAL 9 HOUR) ORDER BY start_time limit 2\G
But it’s totally not fun having to jump to the command line all the time, and you may want it in JSON format for consuming with some web thing…. so you can load the json_server plugin and browse to the port that it’s running on (default 8086) and type the SQL in there and get a JSON response, or just look at the pretty table there.

Puppet snippet for setting up a machine to build Drizzle

You could use this in a Vagrant setup if you like (I’ve done so for testing).

Step 1) Set the following in your Vagrantfile:

Vagrant::Config.run do |config|
  config.vm.box = "lucid32"
  config.vm.box_url = "http://files.vagrantup.com/lucid32.box"
  config.vm.provision :puppet
end

Step 2) Get puppet-apt helper.

I used https://github.com/evolvingweb/puppet-apt and put it in a manifests/ directory like so:

$ mkdir manifests
$ cd manifests
$ git clone git://github.com/evolvingweb/puppet-apt.git

Step 3) Write your puppet manifest:

import "puppet-apt/manifests/init.pp"
import "puppet-apt/manifests/ppa.pp"
class drizzlebuild {
        apt::ppa { "ppa:drizzle-developers/ppa": }
        package { "drizzle-dev":
                  ensure => latest,
        }
}
include drizzlebuild

Step 4) “vagrant  up” and you’re done! Feel free to build Drizzle inside this VM.

I’m sure there may be some more proper way to do it all, but that was a pretty neat first intro to me to Puppet and friends :)

Puppet + Vagrant + jenkins = automated bliss

I’m currently teaching myself how to do Puppet. Why? Well, at Percona we support a bunch of platforms for our software. This means we have to maintain a bunch of Jenkins slaves to build the software on. We want to add new machines and have (up until now) maintained a magic “apt-get install” command line in the Jenkins EC2 configuration. This isn’t an ideal situation and there’s been talk of getting Puppet to do the heavy lifting for a while.

So I sat down to do it.

Step 1: take the “apt-get install” line and convert it into puppet speak.

This was pretty easy. I started off with Vagrant starting a Ubuntu Lucid 32 VM (just like in the Vagrant getting started guide) and enabled the provision using puppet bit.

Step 2: find out you need to run “apt-get update”

Since the base VM I’m using was made there had been updates, so I needed to make any package installation depend on running “apt-get update” to ensure I was both installing the latest version and that the repositories would have the files I was looking for.

This was pretty easy (once I knew how):

exec {"apt-update":
       command => "/usr/bin/apt-get update",
}
Exec["apt-update"] -> Package <| |>

This simply does two things: specify to run “apt-get update” and then specify that any package install depends on having run “apt-update” first.

I’ve also needed things such as:

case $operatingsystem {
     debian, ubuntu: { $libaiodev = "libaio-dev" }
     centos, redhat: { $libaiodev = "aio-devel" }
     default: { fail("Unrecognised OS for libaio-dev") }
}
package { "libaio-dev":
          name => $libaiodev,
          ensure => latest,
}

The idea being that when I go and test all this stuff running on CentOS, it should mostly “just work” there too.

The next step? Setting up and running the Jenkins slave.

Drizzle JSON interface merged

https://code.launchpad.net/~stewart/drizzle/json-interface/+merge/59859

Currently a very early version of course, but it’s there in trunk if you want to play with it. Just have libcurl and libevent installed and you can submit queries via HTTP and JSON. Of course, the next steps are getting a true non-sql interface going and seeing how people go with it.

Speaking on Tuesday: HailDB and Dropping ACID: Eating Data in a Web 2.0 Cloud World

I’m giving two talks tomorrow (Tuesday) at the MySQL Conference and Expo:

HailDB: A NoSQL API direct to InnoDB, 2:00pm, Ballroom D

Dropping ACID: Eating Data In A Web 2.0 Cloud World 3:05pm, Ballroom G

The HailDB talk is all about a C API to embed an InnoDB based relational database engine into your application. Awesome stuff (also nice and technical).

The second talk, “Dropping ACID: Eating Data in a Web 2.0 Cloud World” is not only a joke that only database people get, but a humorous and serious look at data integrity and reliability as promised by the current hype. This was quite well received at linux.conf.au in January. So, if you weren’t in Australia in January this year, then certainly come along and see how you go heckling an Australian.

xtrabackup for Drizzle merge request

Follow it over on launchpad.

After having fixed an incredibly odd compiler warning (and with -Werror that we build with, error) on OSX (die die die) – xtrabackup for Drizzle is ready to be merged. This will bring it into our next milestone: freemont. Over the next few weeks you should see some good tests merged in for backup and restore too.

While not final final, I’m thinking that the installed binary name will be drizzlebackup.innobase. A simple naming scheme for various backup tools that are Drizzle specific. This casually pre-empts a drizzlebackup tool that can co-ordinate all of these (like the innobackupex script).

Drizzle online backup with xtrabackup

For backups, historically in the MySQL world you’ve had mysqldump (a SQL dump, means on restore you have to rebuild indexes), InnoDB Hot Backup (proprietary, but takes a copy of the InnoDB data files, so restore is much quicker), LVM snapshots (various scripts exist, does have larger IO impact, requires LVM) and more recently xtrabackup. Xtrabackup essentially does the same thing as InnoDB hot backup except that it’s free and open source software.

Many people have been using xtrabackup successfully for quite a while now.

In Drizzle7, our default storage engine is InnoDB. There have been a few changes, but it is totally InnoDB. This leaves us with the question of backup solutions. We have drizzledump (the Drizzle equivalent to MySQL dump – although with fewer gotchas), you could always use LVM snapshots and the probability of Oracle releasing InnoDB Hot Backup for Drizzle is rather minimal.

So enter xtrabackup as a possible solution… I had though of porting xtrabackup across for a while. Last weekend, while waiting for one of my iterations of catalog support to compile, I decided to give it a go. I wanted to see how far I could get with it also in that weekend.

I was successful – there’s a tree up at lp:~stewart/drizzle/xtrabackup thatproduces an xtrabackup binary that’s built for Drizzle (it’s not quite ready for merging yet, there are some obivous bugs around command line option parsing… but a backup and restore did work).

I wanted the following:

  • build to be integrated with Drizzle, using the same innobase build that we use to build the server
  • build with strict compiler warnings and -Werror (which we do forDrizzle)
  • build with a C++ compiler (as we do with innobase in Drizzle)
  • not re-add parts of mysys into the Drizzle build just for xtrabackup

I’ve already submitted merge requests to upstream xtrabackup containing the compiler fixes and added compiler warnings (they’ve also by now been merged into xtrabackup). Already my work has improved the quality of xtrabackup for everyone. Some of the warnings were fixed slightly differently in xtrabackup than in my Drizzle tree, but I plan to merge.

One issue was that the command line parsing library that xtrabackup uses – my_getopt which is part of mysys (the portability library inside MySQL) is long since gone from Drizzle. We currently use Boost::program_options. Thanks to the heroic efforts of Andrew Hutchings, xtrabackp in Drizzle is also using boost::program_options. This was a brilliant “hey, can you have a look at this conversion” followed by handing him a tree that did not even remotely compile, followed by a “I have to take the kids somewhere, here’s a tree – it may compile”. Amazingly enough, it pretty much did compile once I fixed the other issues.

An unresolved issue is how to deal with this going forward – my guess is that upstream xtrabackup doesn’t want to require Boost.

One solution could be just to factor out command line options into a sepfile that we can ignore for Drizzle and replace with our own. The other option could be to use a differnt command line option parsing library (perhaps from CCAN, as it’s then maintained by somebody else and doesn’t require heaps and heaps of other stuff).

Another issue I had to tackle is the patch to innobase that’s required to build xtrabackup.

I took a very minimal approach for the Drizzle patch. We are currently based on innobase 1.1.4 from MySQL 5.5 – so I mostly looked at the xtradb55 patch. I think it would be great if these were instead of one giant patch a series of patches to apply (a-la quilt) to a) make iteasier maintain and b) easier for myself to work out the exact reasoning of each bit (also, generating the patches with -p would help a fair bit too).

So how did I do it?

Step 0
was removing support for old innobase – we totally don’t need it for Drizzle.

Step 1
was creating a srv_read_only option for Drizzle’s innobase. This was fairly easy. The one thing I did have to change was adding a checkin os_file_lock() so that we don’t attempt to write lock the ibdatafiles when in read only (otherwise backups can’t be taken while drizzledis running). I’m a little surprised that this wasn’t hit in 5.5 at all.

Step 2
was implementing srv_fake_write. I’m pretty sure I’ve gotten this right in the Drizzle implementation, but the patch wasn’t as easy toread as I’d really like. I probably need to do a bit more of a code audit that this is actually correct (I may try and come up with anLD_PRELOAD library that will scream loudly if writes are made to files matching a pattern).

Step 3
was implemnting srv_apply_log_only. Pretty sure I have this right, again, more testing will be required. Why? Because I’m that paranoid about getting things very, very right.

Step 4
was to go through all the functions that xtrabackup needed to not be static. Instead of having prototypes for them inxtrabackup.cc, I instead added a xtrabackup_api.h header to Innobase and included it where needed (including in xtrabackup). I’d recommend this way going forward for xtrabackup too as it could be a lot less problematic to maintain (and makes xtrabackup source a bit easier to read)

Step 5
was fixing up a few skeleton functions that were needed to make our innobase happy. It may not be a bad idea to split out the skeleton functions into a sep source file so it’s a bit easier to track (and some #ifdefs around those not needed for certain releases).

I’m hoping to work with the upstream xtrabackup devs on the various points I’ve made above.
Another thought of mine is to port xtrabackup into HailDB where we can use much more neat API functions to create good tests for xtrabackup.

Thanks go out to all who’ve worked on xtrabackup. It honestly wasn’t too hardgetting it ported across to Drizzle – and with a bit of collaboration I think we can make it easy to keep up to date.

What’s the future for Xtrabackup in Drizzle? It’ll likely end up being a binary named drizzlebackup-innobase or similar (this means that there is a clear difference between xtrabackup for MySQL and what we have in Drizzle – which is more accurately defined as based on xtrabackup). We’ll also probably want a nice wrapper or integration with a backup tool to deal with everything Drizzle related. We shall also introduce a lot of testing; backups are important.

Xtrabackup is topical, check out the latest OurSQL podcast and the the Percona Xtrabackup website for more info!

Multi-tenancy Drizzle

My previous post focused on some of the problems of doing multi-tenant MySQL.

One of the reasons why I started hacking on Drizzle was that the multi-tenancy options for MySQL just weren’t very good (this is also the reason why I run my blog in a VM and not a shared hosting solution).

What you really want is to be able to give your users access to a virtual database server. What you don’t want is to be administering a separate database server for each of your users. What you want are CATALOGs.

A CATALOG is a collection of SCHEMAs (which have TABLEs in them). Each CATALOG is isolated from all the others. Once you connect to a catalog, that’s it. They are entirely separate units. There are no cross-catalog queries or CHANGE CATALOG commands. It is as if each catalog is its own database server.

You can easily imagine a world where there are per-catalog resource limits too (max connections, max temp tables etc).

For the Drizzle7 release, we got in some preliminary support to ensure that the upgrade path would be easy. You’ll notice that all the schemas you create are in the ‘local’ catalog (you can even spot this on the file system in the datadir).

For the next Drizzle milestone, the intent has been to flesh out this support to enable a very elegant multi-tenant solution for Drizzle.

One of the things we worked on a little while ago now is introducing TableIdentifier and SchemaIdentifier objects into Drizzle. Historically (and still in the MySQL codebase) tables would be referenced by a string in the format “database/table_name” (except sometimes when it could be “database\table_name”). There were also various times when this was the name as entered by the user and times when this was converted into a safe form for storing on disk (and comparing to one another).

Everywhere in Drizzle where we have to deal with the path to a table, we call a method on a TableIdentifier for it. There is also a method for getting a series of bytes that can be used as a key in a data structure (e.g. table definition cache). It’s even used to work out what path on the file system to store the data file in.

I would say that the use of TableIdentifier and SchemaIdentifier has prevented many bugs from creeping into the server during development. It took several aborted goes to get something like this into the codebase, just because how the database name, table name and table path strings were being used and abused around the server.

So, with our cleaned up codebase using TableIdentifier and SchemaIdentifier, how hard would it really be to introduce the new level of CATALOG?

So along comes an epic hacking session last weekend. Basically I wanted to see how close I could get in such a short time. I introduced a CatalogIdentifier and then forced the SchemaIdentifier constructor to require one as a parameter….

The great benefit? Every place in the code that I needed to change was a compile error. The overwhelming majority of places I had to fix were shown to me by GCC. I cannot express how wonderful this is.

Anyway, by going through all of these places and fixing up a few things that also needed fixing (including just throwing in a few temporary hacks), I got it working!

I have a tree on launchpad (lp:~stewart/drizzle/multitenant) where you can create multiple catalogs, connect to any of the catalogs you’ve created (or local), create schemas and tables in them. Each catalog is separate. So you can have 1 server with 2 schemas called ‘test’ and 2 tables called ‘t1’.

This is a pretty early development tree… and it comes with some limitations (that will need to be fixed):

  • “local” is the default catalog (as it is in Drizzle7)
  • “local” must exist
  • I_S and DATA_DICTIONARY only show up in local (i.e. SHOW TABLES doesn’t even work yet)
  • HailDB is “local” only
  • testing equals approximately zero
  • currently the only protocol plugin that supports connecting to another catalog is the console plugin

But, as a result of only a weekend of hacking, pretty cool.

Paving the way for easy Database-As-Aa-Service.

Things I’ve done in Drizzle

When writing my Dropping ACID: Eating Data in a Web 2.0 Cloud World talk for LCA2011 I came to the realisation that I had forgotten a lot of the things I had worked on in MySQL and MySQL Cluster. So, as a bit of a retrospective as part of the Drizzle7 GA release, I thought I might try and write down a (incomplete) list of the various things I’ve worked on in Drizzle.

I noticed I did a lot of code removal, that’s all fine and dandy but maybe I won’t list all of that… except perhaps my first branch that was merged :)

2008

  • First ever branch that was merged: some mysys removal (use POSIX functions instead of wrappers that sometimes have different semantics than their POSIX functions), some removal of NETWARE, build scripts that weren’t helpful (i.e. weren’t what any build team ever used to build a release) and some other dead code removal.
  • Improve ‘make test’ time – transactions FTW! (this would be a theme for me over the years, I always want build and test to be faster)
  • Started moving functions out into their own plugins, removing the difference between UDFs (User Defined Functions) and builtin functions. One API to rule them all.
  • Ahhh compiler warnings (we now build with -Werror) and unchecked return codes from system calls (we now add this to some of our own methods, finding and fixing even more bugs). We did enable a lot of compiler warnings and OMG fix a lot of them.
  • Removal of FRM – use a protobuf message to describe the table. The first branch that implemented any of this was merged mid-November. It was pretty minimal, and we still had the FRM around for a little while yet – but it was the beginning of the end for features that couldn’t be implemented due to limitations in the FRM file format. I wrote a few blog entries about this.
  • A lot of test fixes for Drizzle
  • After relating the story of ☃.test (hint: it broke the ability to check out the source tree on Solaris) to Anthony Baxter, he suggested trying something rather nasty… a unicode character above 2^16 – I chose 𝄢 – which at the time didn’t even render on Ubuntu – this was the test case you could only see correctly on MacOS X at the time (or some less broken Linux distro I guess). Some time later, I was actually able to view the test file on Ubuntu correctly.
  • I think it was November 1st when I started to work on Drizzle full time – this was rather awesome, although a really difficult decision as I did rather enjoy working with all the NDB guys.

2009:

  • January sparked the beginning of reading table information from the table protobuf message instead of the FRM file. The code around the FRM file was lovely and convoluted in places – to this day I’m surprised at the low bug count all that effort resulted in. One day I may write up a list of bugs and exploits probably available through the FRM code.
  • My hate for C++ fstream started in Feb 2009
  • In Feb I finally removed the code to work out (and store) in the FRM file how to display a set of fields for entering data into the table on a VT100 80×24 screen.
  • I filed my first GCC bug. The morning started off with a phone call and Brian asking me to look at some strange bug and ended with Intel processor manuals (mmm… the 387 is fun), the C language standard and (legitimately) finding it was a real compiler bug. Two hours and two minutes after filing the bug there was a patch to GCC fixing it – I was impressed.
  • By the end of Feb, the FRM was gone.
  • I spent a bit of time making Drizzle on linux-sparc work. We started building with compiler options that should be much friendlier to fast execution on SPARC processors (all to do with aligning things to word boundaries). -Wcast-align is an interesting gcc flag
  • Moved DDL commands to be in StorageEngine rather than handler for an instance of an open table (now known as Cursor)
  • MyISAM and CSV as temporary table only engines – this means we save a bunch of mucking about in the server.
  • Amazingly enough, sprintf is dangerous.
  • Moved to an API around what tables exist in a database so that the Storage Engines can own their own metadata.
  • Move to have Storage Engines deal with the protobuf table message directly.
  • Found out you should never assume that your process never calls fork() – if you use libuuid, it may. If it wasn’t for this and if we had param-build, my porting of mtr2 to Drizzle probably would have gone in.
  • There was this thing called pack_flag – its removal was especially painful.
  • Many improvements to the table protobuf message (FRM replacement) code and format – moving towards having a file format that could realistically be produced by code that wasn’t the Drizzle database kernel.
  • Many bug fixes, some acused by us, others exposed by us, others had always been there.

2010:

  • embedded_innodb storage engine (now HailDB). This spurred many bug and API fixes in the Storage Engine interface. This was an education in all the corner cases of the various interfaces, where the obvious way to do things was almost always not fully correct (but mostly worked).
  • Engine and Schema custom KEY=VALUE options.
  • Found a bug in the pthread_mutex implementation of the atomics<> template we had. That was fun tracking down.
  • started writing more test code for the storage engine interface and upper layer. With our (much improved) storage engine interface, this became relatively easy to implement a storage engine to test specific bits of the upper layer.
  • Wrote a CREATE TABLE query that would take over four minutes to run. Fixed the execution time too. This only existed because of a (hidden and undocumented) limitation in the FRM file format to do with ENUM columns.
  • Characters versus bytes is an important difference, and one that not all of the code really appreciated (or dealt with cleanly)
  • FOREIGN KEY information now stored in the table protobuf message (this was never stored in the FRM).
  • SHOW CREATE TABLE now uses a library that reads the table protobuf message (this will later be used in the replication code as well)
  • Started the HailDB project
  • Updated the innobase plugin to be based on the current InnoDB versions.
  • Amazingly, noticed that the READ_COMMITTED isolation level was never tested (even in the simplest way you would ever explain READ_COMMITTED).
  • Created a storage engine for testing called storage_engine_api_tester (or SEAPITester). It’s a dummy engine that checks we’re calling things correctly. It exposed even more bugs and strangeness that we weren’t aware of.
  • Fixed a lot of cases in the code where we were using a large stack frame in a function (greater than 32kb).
  • An initial patch using the internal InnoDB API to store the replication log

2011:

  • This can be detailed later, it’s still in progress :)
  • The big highlight: a release.

Drizzle7

We’ve released Drizzle7! Not only that, we’re now calling it Generally Available – a GA release.

What does this mean? What does this GA label mean?

You could view as a GA label being “we’re pretty confident people aren’t going to on mass ask for our heads when they start using it”… which isn’t a too bad description. We also plan to maintain it, there could be future releases in this series that just include bug fixes – we won’t just immediately tell you to go and use the latest tarball or bzr tree. This release series is a good one to use.

Drizzle7 is something that can be packaged in Linux distros. It’s no longer something where the best bet is to add the PPA and upgrade every two weeks or build from source yourself. If you’re looking to deploy Drizzle (or develop against it) – you can rely on this release.

I’ll never use the words “production ready” to describe a release – it’s never up to me. It’s up to each person or organisation looking to deploy a piece of software to decide if that bit of software is production ready for them.

Personally, I’m looking forward to see how people can break it. While Drizzle is the best tested FOSS SQL RDBMS server, I’m sure there’s new an interesting ways it can be broken by saying we’re ready for a much larger crowd to hammer on it.

Overall, I think we’ve managed to take the now defunct MySQL 6.0 tree (way back in 2008) and release something that can truly live up to the line “database for cloud”. Drizzle is modern, modular, rather solid and understandable. The future is bright, there is so much more to do to make the ultimate database for cloud. Drizzle7 is a great platform to build on – both for us (developers) and us (people who use relational databases).

Fixed in Drizzle: No more “GOTCHA’s”

 

O'Reilly MySQL Conference & Expo 2011

At the upcoming MySQL Conference and Expo, I’m going to give a Thursday afternoon (2pm) session entitled Fixed in Drizzle: No more “GOTCHA’s”. I plan to have a lot of fun with this session..

If you go back to the very start of when I started submitting code to Drizzle (June 2008) – I was going and fixing some of my favourite “gotcha’s” inside the code: BUILD/ scripts that didn’t build the way releases would, wrappers on POSIX functions with different (and inconsistent) semantics, NETWARE support, a non thread safe client lib, my_errno (different to errno) etc. I won’t really be talking about internals like this – it may give me a happy but really isn’t the latest awesome in databases.

I’ll instead be going over the way more awesome user and DBA visible things we’ve fixed/added/removed from Drizzle that make it a database with as few GOTCHA’s as possible.

Authentication (pluggable, LDAP), Logging (to syslog, gearman), DATA_DICTIONARY, INFORMATION_SCHEMA, engines owning their own metadata, STRICT mode by default, removing global mutexes, improving the Storage Engine API, improving the replication log, including code such as PBXT and PBMS Blob Streaming, filesystem engine (read files from disk like a table), pluggable protocol, UTF8 by default, ENUM data type, auto_increment behaviour.

All this and more is “Fixed in Drizzle”.

(oh, and there’s no 24bit integer or a BLOB that can only be 255 bytes)

Undocumented ALTER TABLE that does *nothing* (useful)

(at least since MySQL 5.1.42)

alter table t1 force;

Pretty neat huh? In fact, in Drizzle this will end up doing a copying alter table. Not useful.

There’s an over four year old bug report in MySQL (Bug#24091).

I’m just going to remove that bit from the parser in Drizzle – it makes no sense.

SQL Oddity: ALTER TABLE and default values

So, the MySQL (and Drizzle) ALTER TABLE syntax allows you to easily change the default value of a column. For example:

CREATE TABLE t1 (answer int);
ALTER TABLE t1 ALTER answer SET DEFAULT 42;

So, you create a TIMESTAMP column and forgot to set the default value to CURRENT_TIMESTAMP. Easy, just ALTER TABLE:

create table t1 (a timestamp);
alter table t1 alter a set default CURRENT_TIMESTAMP;

(This is left as another exercise for the reader as to what this will do – again, maybe not what you expect)

ALTER TABLE RENAME RENAME RENAME

Here’s a nice challenge for you. What does the following do (or error out on?):

CREATE TABLE t1 (a int);
CREATE TABLE t2 (b int);
ALTER TABLE t1 RENAME t3, RENAME t2, RENAME t4;

I’d be interested to know what a) you think it does and then b) if you were surprised when you went and typed it into your RDBMS of choice.