MariaDB code size

Continuing on from my previous post, MySQL code size over releases.

I wanted to look at the different branches/patch sets of MySQL out there and work out how far from upstream they deviated. I’m just going to compare against whatever upstream version the most easily accessible version is based on (be it 5.0.x, 5.1.x or whatever).

For MariaDB versions, I removed innodb_plugin and replaced it with xtradb for stats purposes as the MariaDB innodb_plugin is essentially the same as upstream and I don’t want to artificially inflate the diff size.

The first three major versions of MariaDB were all based on MySQL 5.1. I used sloccount and only counted C and C++ code.

So, let’s look at some of the MySQL patch sets/branches that are around. Firstly, let’s look at MariaDB:

Codebase LoC (C, C++) +/- from MySQL +/- from prev maj Version
MariaDB 5.1 1,210,168 +157,532 0
MariaDB 5.2 1,227,434 +174,798 +17,266 (since MariaDB 5.1)
MariaDB 5.3 1,264,995 +212,359 +37,561 (since MariaDB 5.2)
MariaDB 5.5 1,377,405 +187,658 (from MySQL 5.5) +112,410 (since MariaDB 5.3)

From my previous post on lines of code in MySQL versions, we learned that with MySQL 5.6 we saw a 354kLOC increase over MySQL 5.5. What is quite surprising is how close some of the MariaDB differences are to this. With MariaDB 5.5, we’re looking at a 187kLOC difference, which is roughly two thirds that of MySQL 5.6. What’s also interesting is that each incremental MariaDB release has not added nearly as much code as the MySQL 5.1 to 5.5 and 5.5 to 5.6 jumps did.

MariaDB LoC over major versions

The MariaDB code size has also been increasing, if we look at the graph above  you can really see the jump in code size over the past few releases.

If we look at the delta between MariaDB and MySQL, the first MariaDB release (MariaDB 5.1) was certainly a large jump. Each incremental MariaDB release (5.2 and 5.3) have been a smaller delta than the initial one. With MariaDB 5.5 we actually decrease the delta from MySQL, which is something that’s interesting to look at.

If we were going a straight port of MariaDB 5.3 to be based off MySQL 5.5, we’d expect the delta to be around 137kLOC (what MySQL 5.1 to 5.5 is) but it isn’t. The difference to MariaDB 5.5 from MariaDB 5.3 is only ~112kLOC, and the on the whole delta decreases.

But what makes up this big initial jump for MariaDB? Let’s look at some of the MariaDB 5.1 only modules and what’s left:

MariaDB 5.1 component LoC (MariaDB 5.1)
PBXT 45,107
FederatedX 3,076
IBM DB2i 13,486
Total 61,669
Other 95,863

So the MariaDB delta is not increase just because they included some existing modules, there’s more code in there, about as much as any major MySQL version bump.

Tomorrow we look at other MySQL branches, and we see that the MariaDB delta truly is significantly larger than any other MySQL branch.

MySQL code size over releases

As the start of a bit of a delve into the various MySQL branches and patch sets that have been around, let’s start looking at the history of MySQL itself. This is how big MySQL has been over all of the major releases since the beginning (where beginning=3.23). (edit: These numbers were all gathered using sloccount and only counting C++ and C source files.)

Codebase LoC (C, C++) +/- from previous MySQL
MySQL 3.23.58 371,987 0
MySQL 4.0.30 368,695 -3,292 (from MySQL 3.23)
MySQL 4.1.24 859,572 +490,877 (from MySQL 4.0)
+174,352 excluding NDB
MySQL 5.0.96 916,667 +57,095 (from MySQL 4.1)
MySQL 5.1.68 1,052,636 +135,969 (from MySQL 5.0)
MySQL 5.5.30 1,189,747 +137,111 (from MySQL 5.1)
MySQL 5.6.10 1,544,202 +354,455 (from MySQL 5.5)
increase in MySQL source code size over version

MySQL code size over major versions

Note the sharp increase in 5.6

LoC Delta for major MySQL release

We can see that MySQL has had some interesting code size changes over time, the big jump in 4.1 over 4.0 was mostly due to the introduction of MySQL Cluster, but even so, it was a big jump.

MySQL 5.6 is the largest MySQL code size increase in a MySQL version ever. The last time we saw anything like this was with the merging of MySQL Cluster in 4.1. At the very least, Oracle is paying people to write lines of code to extent that nobody has before.

Sessions at the Percona Live MySQL Conference that interest me

For the past many years, there’s been a conference in April, at the Santa Clara Convention Centre where the topic has been MySQL and the surrounding ecosystem. The first year I went, I gave a talk on the new features in MySQL Cluster 5.1 to a overflowing room of attendees. For me, it’s an event that’s mixed with speaking about something I’ve been working on and talking to other attendees about everything from how a particular part of the server works to where we can escape to for nearby good vegan food.

So, I thought I’d share some of the sessions that I’m really looking forward to. My selection is probably atypical, but may be interesting to others. I’m not going to list the keynotes, although they are often of a lot of value. I’m also going to attempt to avoid listing a few really awesome well known speakers simply because there are other really interesting sessions that also need exposure!

  • Starring Sakila: Building Data Warehouses and BI solutions using MySQL and Pentaho
    I need to base decisions off data, not simply a gut feeling (I’m not Stephen Colbert after all). I ran into a bunch of stumbling blocks when trying to work with Pentaho a couple of weeks ago, and I’m really hoping that this session shines some light on how to use it to better and more easily make arguments based on evidence to others in the company.
  • Testing MySQL Databases: The State Of The Art
    I’ve worked with Patrick for several years now, and he’s currently a valuable member of my team at Percona. For those who are interested in the state of the art of open source database testing, this is the session to be in.
  • Getting InnoDB Compression Ready for Facebook Scale
    This session is on at the same time as I’m speaking, so I probably won’t be able to attend (people keep coming to my sessions so I usually can’t sneak out). I’m really interested in how they’ve modified the compression code to help with their (large) workload.
  • Backing Up Facebook
    I hear that Facebook has a couple of database servers, a few dozen users and a few floppy disks full of data. This should be a fun story :)
  • Introducing XtraBackup Manager
    Being responsible for XtraBackup development at Percona, the XtraBackup topics really interest me. Lachlan has been working on a simple backup manager for XtraBackup to help create something that is a more complete backup solution than a tool which simply creates a backup.
  • Extending Xtrabackup – A Point-In-Time System
    Another good case of using XtraBackup as part of a comprehensive backup strategy. I have to be honest, I’m looking for ways in which we can improve XtraBackup to better fit the needs of people. It may be that there are a few small things we can do to make it easier for people do deploy and use.
  • Getting Started with Drizzle 7.1
    We’re about to do the 7.1 release of Drizzle! If you’re interested in having a SQL database that is designed to be used in large scale web applications and cloud environments, come along to this talk.
  • MySQL Idiosyncrasies That Bite
    I have to admit, I’m interested in Ronalds talk here to basically ensure we didn’t miss fixing anything in Drizzle. I do promise not to at any point yell out “Fixed in Drizzle” though.

Go here to register: http://www.percona.com/live/mysql-conference-2012/ (early bird pricing and discounted hotel rooms end March 12th, so you want to register sooner rather than later).

Puppet + Vagrant + jenkins = automated bliss

I’m currently teaching myself how to do Puppet. Why? Well, at Percona we support a bunch of platforms for our software. This means we have to maintain a bunch of Jenkins slaves to build the software on. We want to add new machines and have (up until now) maintained a magic “apt-get install” command line in the Jenkins EC2 configuration. This isn’t an ideal situation and there’s been talk of getting Puppet to do the heavy lifting for a while.

So I sat down to do it.

Step 1: take the “apt-get install” line and convert it into puppet speak.

This was pretty easy. I started off with Vagrant starting a Ubuntu Lucid 32 VM (just like in the Vagrant getting started guide) and enabled the provision using puppet bit.

Step 2: find out you need to run “apt-get update”

Since the base VM I’m using was made there had been updates, so I needed to make any package installation depend on running “apt-get update” to ensure I was both installing the latest version and that the repositories would have the files I was looking for.

This was pretty easy (once I knew how):

exec {"apt-update":
       command => "/usr/bin/apt-get update",
}
Exec["apt-update"] -> Package <| |>

This simply does two things: specify to run “apt-get update” and then specify that any package install depends on having run “apt-update” first.

I’ve also needed things such as:

case $operatingsystem {
     debian, ubuntu: { $libaiodev = "libaio-dev" }
     centos, redhat: { $libaiodev = "aio-devel" }
     default: { fail("Unrecognised OS for libaio-dev") }
}
package { "libaio-dev":
          name => $libaiodev,
          ensure => latest,
}

The idea being that when I go and test all this stuff running on CentOS, it should mostly “just work” there too.

The next step? Setting up and running the Jenkins slave.

Information on Bug#12704861 (which doesn’t exist in any public bug tracker)

Some of you may be aware that MySQL is increasingly using an Oracle-internal bug tracker. You can see these large bug numbers mentioned alongside smaller public bug numbers in recent MySQL release notes. If you’re particularly unlucky, you  just get a big Oracle-internal bug number. For a recently fixed bug, I dug further, posted up on the Percona blog: http://www.mysqlperformanceblog.com/2011/11/20/bug12704861/

Possibly interesting reading for those of you who interested in InnoDB, MySQL, BLOBs and crash recovery.

Speaking on Tuesday: HailDB and Dropping ACID: Eating Data in a Web 2.0 Cloud World

I’m giving two talks tomorrow (Tuesday) at the MySQL Conference and Expo:

HailDB: A NoSQL API direct to InnoDB, 2:00pm, Ballroom D

Dropping ACID: Eating Data In A Web 2.0 Cloud World 3:05pm, Ballroom G

The HailDB talk is all about a C API to embed an InnoDB based relational database engine into your application. Awesome stuff (also nice and technical).

The second talk, “Dropping ACID: Eating Data in a Web 2.0 Cloud World” is not only a joke that only database people get, but a humorous and serious look at data integrity and reliability as promised by the current hype. This was quite well received at linux.conf.au in January. So, if you weren’t in Australia in January this year, then certainly come along and see how you go heckling an Australian.

innodb and memcached

I had a quick look at the source tree (I haven’t compiled it, just read the source – that’s what I do. I challenge any C/C++ compiler to keep up with my brain!) that’s got a tarball up on labs.mysql.com for the memcached interface to innodb. A few quick thoughts:

  • Where’s the Bazaar tree on launchpad? I hate pulling tarballs, following the dev tree is much more interesting from a tech perspective (especially for early development releases). I note that the NDB memcached stuff is up on launchpad now, so yay there. I would love it if the InnoDB team in general was much more open with development, especially with having source trees up on launchpad.
  • It embeds a copy of the memcached server engines branch into the MySQL tree. This is probably the correct way to go. There is no real sense in re-implementing the protocol and network stack (this is about half what memcached is anyway).
  • The copy of the memcached engine branch seems to be a few months old.
  • The current documentation appears to be the source code.
  • The innodb_memcached plugin embeds a memcached server using an API to InnoDB inside the MySQL server process (basically so it can access the same instance of InnoDB as a running MySQL server).
  • There’s a bit of something that kind-of looks similar to the Embedded InnoDB (now HailDB) API being used to link InnoDB and memcached together. I can understand why they didn’t go through the MySQL handler interface… this would be bracing to say the least to get correct. InnoDB APIs, much more likely to have fewer bugs.
  • If this accepted JSON and spat it back out… how fast would MongoDB die? weeks? months?
  • The above dot point would be a lot more likely if adding a column to an InnoDB table didn’t involve epic amounts of IO.
  • I’ve been wanting a good memcached protocol inside Drizzle, we have ,of course, focused on stability of what we do have first. That being said…. upgrade my flight home so I can open a laptop… probably be done well before I land….. (assuming I don’t get to it in the 15 other awesome things I want to hack on this week)

Drizzle online backup with xtrabackup

For backups, historically in the MySQL world you’ve had mysqldump (a SQL dump, means on restore you have to rebuild indexes), InnoDB Hot Backup (proprietary, but takes a copy of the InnoDB data files, so restore is much quicker), LVM snapshots (various scripts exist, does have larger IO impact, requires LVM) and more recently xtrabackup. Xtrabackup essentially does the same thing as InnoDB hot backup except that it’s free and open source software.

Many people have been using xtrabackup successfully for quite a while now.

In Drizzle7, our default storage engine is InnoDB. There have been a few changes, but it is totally InnoDB. This leaves us with the question of backup solutions. We have drizzledump (the Drizzle equivalent to MySQL dump – although with fewer gotchas), you could always use LVM snapshots and the probability of Oracle releasing InnoDB Hot Backup for Drizzle is rather minimal.

So enter xtrabackup as a possible solution… I had though of porting xtrabackup across for a while. Last weekend, while waiting for one of my iterations of catalog support to compile, I decided to give it a go. I wanted to see how far I could get with it also in that weekend.

I was successful – there’s a tree up at lp:~stewart/drizzle/xtrabackup thatproduces an xtrabackup binary that’s built for Drizzle (it’s not quite ready for merging yet, there are some obivous bugs around command line option parsing… but a backup and restore did work).

I wanted the following:

  • build to be integrated with Drizzle, using the same innobase build that we use to build the server
  • build with strict compiler warnings and -Werror (which we do forDrizzle)
  • build with a C++ compiler (as we do with innobase in Drizzle)
  • not re-add parts of mysys into the Drizzle build just for xtrabackup

I’ve already submitted merge requests to upstream xtrabackup containing the compiler fixes and added compiler warnings (they’ve also by now been merged into xtrabackup). Already my work has improved the quality of xtrabackup for everyone. Some of the warnings were fixed slightly differently in xtrabackup than in my Drizzle tree, but I plan to merge.

One issue was that the command line parsing library that xtrabackup uses – my_getopt which is part of mysys (the portability library inside MySQL) is long since gone from Drizzle. We currently use Boost::program_options. Thanks to the heroic efforts of Andrew Hutchings, xtrabackp in Drizzle is also using boost::program_options. This was a brilliant “hey, can you have a look at this conversion” followed by handing him a tree that did not even remotely compile, followed by a “I have to take the kids somewhere, here’s a tree – it may compile”. Amazingly enough, it pretty much did compile once I fixed the other issues.

An unresolved issue is how to deal with this going forward – my guess is that upstream xtrabackup doesn’t want to require Boost.

One solution could be just to factor out command line options into a sepfile that we can ignore for Drizzle and replace with our own. The other option could be to use a differnt command line option parsing library (perhaps from CCAN, as it’s then maintained by somebody else and doesn’t require heaps and heaps of other stuff).

Another issue I had to tackle is the patch to innobase that’s required to build xtrabackup.

I took a very minimal approach for the Drizzle patch. We are currently based on innobase 1.1.4 from MySQL 5.5 – so I mostly looked at the xtradb55 patch. I think it would be great if these were instead of one giant patch a series of patches to apply (a-la quilt) to a) make iteasier maintain and b) easier for myself to work out the exact reasoning of each bit (also, generating the patches with -p would help a fair bit too).

So how did I do it?

Step 0
was removing support for old innobase – we totally don’t need it for Drizzle.

Step 1
was creating a srv_read_only option for Drizzle’s innobase. This was fairly easy. The one thing I did have to change was adding a checkin os_file_lock() so that we don’t attempt to write lock the ibdatafiles when in read only (otherwise backups can’t be taken while drizzledis running). I’m a little surprised that this wasn’t hit in 5.5 at all.

Step 2
was implementing srv_fake_write. I’m pretty sure I’ve gotten this right in the Drizzle implementation, but the patch wasn’t as easy toread as I’d really like. I probably need to do a bit more of a code audit that this is actually correct (I may try and come up with anLD_PRELOAD library that will scream loudly if writes are made to files matching a pattern).

Step 3
was implemnting srv_apply_log_only. Pretty sure I have this right, again, more testing will be required. Why? Because I’m that paranoid about getting things very, very right.

Step 4
was to go through all the functions that xtrabackup needed to not be static. Instead of having prototypes for them inxtrabackup.cc, I instead added a xtrabackup_api.h header to Innobase and included it where needed (including in xtrabackup). I’d recommend this way going forward for xtrabackup too as it could be a lot less problematic to maintain (and makes xtrabackup source a bit easier to read)

Step 5
was fixing up a few skeleton functions that were needed to make our innobase happy. It may not be a bad idea to split out the skeleton functions into a sep source file so it’s a bit easier to track (and some #ifdefs around those not needed for certain releases).

I’m hoping to work with the upstream xtrabackup devs on the various points I’ve made above.
Another thought of mine is to port xtrabackup into HailDB where we can use much more neat API functions to create good tests for xtrabackup.

Thanks go out to all who’ve worked on xtrabackup. It honestly wasn’t too hardgetting it ported across to Drizzle – and with a bit of collaboration I think we can make it easy to keep up to date.

What’s the future for Xtrabackup in Drizzle? It’ll likely end up being a binary named drizzlebackup-innobase or similar (this means that there is a clear difference between xtrabackup for MySQL and what we have in Drizzle – which is more accurately defined as based on xtrabackup). We’ll also probably want a nice wrapper or integration with a backup tool to deal with everything Drizzle related. We shall also introduce a lot of testing; backups are important.

Xtrabackup is topical, check out the latest OurSQL podcast and the the Percona Xtrabackup website for more info!

Things I’ve done in Drizzle

When writing my Dropping ACID: Eating Data in a Web 2.0 Cloud World talk for LCA2011 I came to the realisation that I had forgotten a lot of the things I had worked on in MySQL and MySQL Cluster. So, as a bit of a retrospective as part of the Drizzle7 GA release, I thought I might try and write down a (incomplete) list of the various things I’ve worked on in Drizzle.

I noticed I did a lot of code removal, that’s all fine and dandy but maybe I won’t list all of that… except perhaps my first branch that was merged :)

2008

  • First ever branch that was merged: some mysys removal (use POSIX functions instead of wrappers that sometimes have different semantics than their POSIX functions), some removal of NETWARE, build scripts that weren’t helpful (i.e. weren’t what any build team ever used to build a release) and some other dead code removal.
  • Improve ‘make test’ time – transactions FTW! (this would be a theme for me over the years, I always want build and test to be faster)
  • Started moving functions out into their own plugins, removing the difference between UDFs (User Defined Functions) and builtin functions. One API to rule them all.
  • Ahhh compiler warnings (we now build with -Werror) and unchecked return codes from system calls (we now add this to some of our own methods, finding and fixing even more bugs). We did enable a lot of compiler warnings and OMG fix a lot of them.
  • Removal of FRM – use a protobuf message to describe the table. The first branch that implemented any of this was merged mid-November. It was pretty minimal, and we still had the FRM around for a little while yet – but it was the beginning of the end for features that couldn’t be implemented due to limitations in the FRM file format. I wrote a few blog entries about this.
  • A lot of test fixes for Drizzle
  • After relating the story of .test (hint: it broke the ability to check out the source tree on Solaris) to Anthony Baxter, he suggested trying something rather nasty… a unicode character above 2^16 – I chose 𝄢 – which at the time didn’t even render on Ubuntu – this was the test case you could only see correctly on MacOS X at the time (or some less broken Linux distro I guess). Some time later, I was actually able to view the test file on Ubuntu correctly.
  • I think it was November 1st when I started to work on Drizzle full time – this was rather awesome, although a really difficult decision as I did rather enjoy working with all the NDB guys.

2009:

  • January sparked the beginning of reading table information from the table protobuf message instead of the FRM file. The code around the FRM file was lovely and convoluted in places – to this day I’m surprised at the low bug count all that effort resulted in. One day I may write up a list of bugs and exploits probably available through the FRM code.
  • My hate for C++ fstream started in Feb 2009
  • In Feb I finally removed the code to work out (and store) in the FRM file how to display a set of fields for entering data into the table on a VT100 80×24 screen.
  • I filed my first GCC bug. The morning started off with a phone call and Brian asking me to look at some strange bug and ended with Intel processor manuals (mmm… the 387 is fun), the C language standard and (legitimately) finding it was a real compiler bug. Two hours and two minutes after filing the bug there was a patch to GCC fixing it – I was impressed.
  • By the end of Feb, the FRM was gone.
  • I spent a bit of time making Drizzle on linux-sparc work. We started building with compiler options that should be much friendlier to fast execution on SPARC processors (all to do with aligning things to word boundaries). -Wcast-align is an interesting gcc flag
  • Moved DDL commands to be in StorageEngine rather than handler for an instance of an open table (now known as Cursor)
  • MyISAM and CSV as temporary table only engines – this means we save a bunch of mucking about in the server.
  • Amazingly enough, sprintf is dangerous.
  • Moved to an API around what tables exist in a database so that the Storage Engines can own their own metadata.
  • Move to have Storage Engines deal with the protobuf table message directly.
  • Found out you should never assume that your process never calls fork() – if you use libuuid, it may. If it wasn’t for this and if we had param-build, my porting of mtr2 to Drizzle probably would have gone in.
  • There was this thing called pack_flag – its removal was especially painful.
  • Many improvements to the table protobuf message (FRM replacement) code and format – moving towards having a file format that could realistically be produced by code that wasn’t the Drizzle database kernel.
  • Many bug fixes, some acused by us, others exposed by us, others had always been there.

2010:

  • embedded_innodb storage engine (now HailDB). This spurred many bug and API fixes in the Storage Engine interface. This was an education in all the corner cases of the various interfaces, where the obvious way to do things was almost always not fully correct (but mostly worked).
  • Engine and Schema custom KEY=VALUE options.
  • Found a bug in the pthread_mutex implementation of the atomics<> template we had. That was fun tracking down.
  • started writing more test code for the storage engine interface and upper layer. With our (much improved) storage engine interface, this became relatively easy to implement a storage engine to test specific bits of the upper layer.
  • Wrote a CREATE TABLE query that would take over four minutes to run. Fixed the execution time too. This only existed because of a (hidden and undocumented) limitation in the FRM file format to do with ENUM columns.
  • Characters versus bytes is an important difference, and one that not all of the code really appreciated (or dealt with cleanly)
  • FOREIGN KEY information now stored in the table protobuf message (this was never stored in the FRM).
  • SHOW CREATE TABLE now uses a library that reads the table protobuf message (this will later be used in the replication code as well)
  • Started the HailDB project
  • Updated the innobase plugin to be based on the current InnoDB versions.
  • Amazingly, noticed that the READ_COMMITTED isolation level was never tested (even in the simplest way you would ever explain READ_COMMITTED).
  • Created a storage engine for testing called storage_engine_api_tester (or SEAPITester). It’s a dummy engine that checks we’re calling things correctly. It exposed even more bugs and strangeness that we weren’t aware of.
  • Fixed a lot of cases in the code where we were using a large stack frame in a function (greater than 32kb).
  • An initial patch using the internal InnoDB API to store the replication log

2011:

  • This can be detailed later, it’s still in progress :)
  • The big highlight: a release.

Drizzle7

We’ve released Drizzle7! Not only that, we’re now calling it Generally Available – a GA release.

What does this mean? What does this GA label mean?

You could view as a GA label being “we’re pretty confident people aren’t going to on mass ask for our heads when they start using it”… which isn’t a too bad description. We also plan to maintain it, there could be future releases in this series that just include bug fixes – we won’t just immediately tell you to go and use the latest tarball or bzr tree. This release series is a good one to use.

Drizzle7 is something that can be packaged in Linux distros. It’s no longer something where the best bet is to add the PPA and upgrade every two weeks or build from source yourself. If you’re looking to deploy Drizzle (or develop against it) – you can rely on this release.

I’ll never use the words “production ready” to describe a release – it’s never up to me. It’s up to each person or organisation looking to deploy a piece of software to decide if that bit of software is production ready for them.

Personally, I’m looking forward to see how people can break it. While Drizzle is the best tested FOSS SQL RDBMS server, I’m sure there’s new an interesting ways it can be broken by saying we’re ready for a much larger crowd to hammer on it.

Overall, I think we’ve managed to take the now defunct MySQL 6.0 tree (way back in 2008) and release something that can truly live up to the line “database for cloud”. Drizzle is modern, modular, rather solid and understandable. The future is bright, there is so much more to do to make the ultimate database for cloud. Drizzle7 is a great platform to build on – both for us (developers) and us (people who use relational databases).

Fixed in Drizzle: No more “GOTCHA’s”

 

O'Reilly MySQL Conference & Expo 2011

At the upcoming MySQL Conference and Expo, I’m going to give a Thursday afternoon (2pm) session entitled Fixed in Drizzle: No more “GOTCHA’s”. I plan to have a lot of fun with this session..

If you go back to the very start of when I started submitting code to Drizzle (June 2008) – I was going and fixing some of my favourite “gotcha’s” inside the code: BUILD/ scripts that didn’t build the way releases would, wrappers on POSIX functions with different (and inconsistent) semantics, NETWARE support, a non thread safe client lib, my_errno (different to errno) etc. I won’t really be talking about internals like this – it may give me a happy but really isn’t the latest awesome in databases.

I’ll instead be going over the way more awesome user and DBA visible things we’ve fixed/added/removed from Drizzle that make it a database with as few GOTCHA’s as possible.

Authentication (pluggable, LDAP), Logging (to syslog, gearman), DATA_DICTIONARY, INFORMATION_SCHEMA, engines owning their own metadata, STRICT mode by default, removing global mutexes, improving the Storage Engine API, improving the replication log, including code such as PBXT and PBMS Blob Streaming, filesystem engine (read files from disk like a table), pluggable protocol, UTF8 by default, ENUM data type, auto_increment behaviour.

All this and more is “Fixed in Drizzle”.

(oh, and there’s no 24bit integer or a BLOB that can only be 255 bytes)

SQL Oddity: ALTER TABLE and default values

So, the MySQL (and Drizzle) ALTER TABLE syntax allows you to easily change the default value of a column. For example:

CREATE TABLE t1 (answer int);
ALTER TABLE t1 ALTER answer SET DEFAULT 42;

So, you create a TIMESTAMP column and forgot to set the default value to CURRENT_TIMESTAMP. Easy, just ALTER TABLE:

create table t1 (a timestamp);
alter table t1 alter a set default CURRENT_TIMESTAMP;

(This is left as another exercise for the reader as to what this will do – again, maybe not what you expect)

Timing queries in the 21st century (with LD_PRELOAD and sed)

So… Baron blogged about wanting higher precision timers from the mysql binary and that running sed on the binary wasn’t cutting it. However… I am not one to give up that easily!

This is what LD_PRELOAD was made for! Evil nasty hacks to make your life easier!

By looking at the mysql.cc source code, I can easily work out how this works… I just have to override two calls! They being sysconf() (we fake how many ticks per second there are) and times() (let’s return a much higher precision number).

Combined with the sed hack on the binary to change the sprintf call to print out the higher precision number, we have:

mysql> select count(*) from t1;
+----------+
| count(*) |
+----------+
|   710720 |
+----------+
1 row in set (1.080110 sec)

Get it from my junkcode: http://www.flamingspork.com/junkcode/high_precision_mysql_timer.c

(or you can get it from http://paste.drizzle.org/show/364/)

Is your Storage Engine buggy or the database server?

If your storage engine returns an error from rnd_init (or doStartTableScan as it’s named in Drizzle) and does not save this error and return it in any subsequent calls to rnd_next, your engine is buggy. Namely it is buggy in that a) an error may not be reported back to the user and b) everything may explode horribly when rnd_next is called after rnd_init returned an error.

Unless it is running on MariaDB 5.2 or (soon, when the patch hits the tree) Drizzle.

Monty (Widenius, not Taylor) wrote a patch for MariaDB based on my bug report that addressed that problem. It uses the compiler feature to throw a warning if the result of a function isn’t checked to make sure that all places that call rnd_init are checking for an error from the engine.

Today I (finally) pulled that into Drizzle as well.

So… if your engine does the logical thing and goes “oh look, this method returns an error… I’ll return my error” it will exhibit bugs in MySQL but not MariaDB 5.2 or Drizzle (when patch hits).

Which is buggy, the server or the engine?

The MySQL bug number is 54166, filed in June 2010.

Persistent index statistics for InnoDB

In browsing the BZR tree for lp:mysql-server, I noticed some rather exciting code had been merged into the Innobase code.

You may be aware that InnoDB will do some index dives when opening a table to get some statistics about the indexes that can help the optimiser make good query plans.

The problem being that this is many disk seeks. It means that on server restart, you have to spend a whole bunch of time seeking around the disk reading index pages.

Not any more.

There is now code merged in to store the calculated statistics in a table inside InnoDB so that these index dives don’t have to happen on startup.

Originally, this looked like it was going to make it into InnoDB+. The good news is that it’s now in a public source tree. I look forward to when it hits a stable release.

(hopefully somebody other than me can beat me to it and write a nice description of the algorithms involved… the code is pretty easy to follow, so it shouldn’t be hard)