Who is working on MariaDB 10.0?

There was some suggestion after my previous post (Who works on MariaDB and MySQL?) that I look at MariaDB 10.0 – so I have. My working was very simple, in a current MariaDB 10.0 BZR tree (somewhat beyond 10.0.3), I ran the following command:

bzr log -n0 -rtag:mariadb-10.0.0..|egrep '(author|committer): '| \
  sed -e 's/^\s*//; s/committer: //; s/author: //'| \
  sort -u|grep -iv oracle

 

MariaDB foundation/MontyProgram/SkySQL:

  1. Alexander Barkov
  2. Alexey Botchkov
  3. Daniel Bartholomew
  4. Elena Stepanova
  5. Igor Babaev
  6. Jani Tolonen
  7. knielsen
  8. Michael Widenius
  9. sanja
  10. Sergei Golubchik
  11. Sergey Petrunya
  12. Sergey Vojtovich
  13. timour
  14. Vladislav Vaintroub

Elsewhere:

  1. Kentoku SHIBA (4 commits)
  2. Lixun Peng (1 commit)
  3. Olivier Bertrand (212 commits)

From Oracle (i.e. revisions merged from Oracle MySQL):

  • 81 names (which I won’t list here as 81 is a lot)

The results are no different if you go back to the first revision that is different between MariaDB 5.5 and 10.0 (found using bzr missing). Even when grepping through the bzr log for things such as “patch by”, “contribution” or “originally” I can only find 1 or two more names as original authors for patches (about the same as I can for patches going into the Oracle tree).

Please point me to revisions (revid is best way) that come from outside contributors as then I really can update this to show that there’s a larger developer community.

The current development version of Drizzle (7.2) has just as many contributors as the MariaDB development version (10.0) – although Drizzle does have fewer commits.

nanomysql – tiny MySQL client lib

I recently got pointed towards https://github.com/shodanium/nanomysql/ which is a tiny (less than 400 lines of C++) MySQL client library which is GPL licensed.

If you need to link into non-GPL compatible code, there is the (slightly larger and full featured) libdrizzle library. But if you want something *tiny* and are okay with GPL, then nanomysql may be something to look at.

Who works on MariaDB and MySQL?

Looking at the committers/authors of patches in the bzr tree for MariaDB 5.5.31.

Non Oracle Contributors:

  1. Alexander Barkov
  2. Alexey Botchkov
  3. Elena Stepanova
  4. Igor Babaev
  5. knielsen
  6. Michael Widenius
  7. sanja
  8. Sergei Golubchik
  9. Sergey Petrunya
  10. timour
  11. Vladislav Vaintroub

Oracle (as they pull Oracle changes):

  1. Aditya A
  2. Akhila Maddukuri
  3. Alexander Nozdrin
  4. Anirudh Mangipudi
  5. Annamalai Gurusami
  6. Astha Pareek
  7. Balasubramanian Kandasamy
  8. Chaithra Gopalareddy
  9. Daniel Fischer
  10. Gleb Shchepa
  11. Harin Vadodaria
  12. Hery Ramilison
  13. Igor Solodovnikov
  14. Inaam Rana
  15. Jon Olav Hauglid
  16. kevin.lewis
  17. Krunal Bauskar
  18. Marc Alff
  19. Marko Mäkelä
  20. Mattias Jonsson
  21. Murthy Narkedimilli
  22. Neeraj Bisht
  23. Nisha Gopalakrishnan
  24. Nuno Carvalho
  25. Olav Sandstaa
  26. Pedro Gomes
  27. prabakaran thirumalai
  28. Praveenkumar Hulakund
  29. Ravinder Thakur
  30. Satya Bodapati
  31. sayantan.dutta
  32. Shivji Kumar Jha
  33. Sujatha Sivakumar
  34. Sunanda Menon
  35. Sunny Bains
  36. Thayumanavar
  37. Tor Didriksen
  38. Venkata Sidagam
  39. Venkatesh Duggirala
  40. Yasufumi Kinoshita

Observations:

  1. All the non-Oracle contributors work for SkySQL (and worked for Monty Program before that)
  2. Even when you go back to MariaDB 5.5.23 I can only find evidence for a maximum of 2-3 external contributions of code to MariaDB since then.
  3. In the same time frame (5.5.23-5.5.32) I see 1 or 2 going into Oracle trees, so it’s roughly the same.
  4. If you look at the contributors from Oracle over 5.5.23 to 5.5.32 there are closer to twice as many as the 40 listed above.

Somebody please correct me if I’m wrong here… perhaps MariaDB guys are just really bad at clearly marking commits that come from elsewhere? I’ve looked for “patch.*by”, “original” and “ontributed” and only turned up the above.

Are MariaDB tests adding anything extra over Oracle MySQL tests?

I grabbed all the tests introduced in MariaDB 5.5.32 (i.e. “bzr diff -rtag:mariadb-5.5.31..mariadb-5.5.32 mysql-test/” and some foo) and threw them in their own test file. I only kept tests for crashing bugs and ignored those that required plugins (there were two or three, but nothing major). So now I have a test file that should crash MariaDB 5.5.31 and probably before. But, the question is: does this crash Percona Server or MySQL?

While it is excellent to see the MariaDB guys including tests for their crashing bugs, are these MariaDB specific or do they affect other MySQL flavours?

I built a release build of top of trunk Percona Server and ran the test against it. I got no crashes. In a debug build, I got two. One was to do with REPAIR on an ARCHIVE table and the other was “SELECT UNIX_TIMESTAMP(STR_TO_DATE(‘2020′,’%Y’));”. I found the same thing for a debug build of top of tree MySQL.

All the other tests for crashing bugs, of which there were 14 – were MariaDB specific. So, out of 16 total, only 2 applied to Percona Server and MySQL.

Why do some foods taste absolutely AWESOME on playa? (a theory)

(I originally posted this to our camp mailing list, but I figured that the wider population of the internet may also be interested)

A couple of things prompted me thinking about this, and I shared my thoughts with Leah tonight and she’s been thinking the same kind of things.

We’ve observed that some food on playa is absolutely THE BEST THING EVER to enter our mouth holes while some things are a bit more meh than they should be. Basically,  we’re theorising that human taste buds change at Burning Man when compared to the default world.

A while ago we saw Heston Blumenthal trying to fix airline food, where he conducted an experiment where at sea level and then in a compartment that was pressurized to the altitude of a plane he had different concentrations of sweet, sour, bitter, salty and umami to taste.

So, let’s look at the differences in altitude and humidity between home,
playa and an airplane:

Altitude

Home: within a few hundred ft of sealevel.
Black Rock Desert: 3,907ft (from wikipedia)
Airplane cabin: 6,000ft (Boeing 787) to 6,900ft (Boeing 767) (or much closer to sea level if you’ve got a private jet)

Humidity

Home: 40-80% (inside/outside my house right now)
Airplane: 12-15%
Burning Man: 24% (average for August), low teens during the day (10-15% midday to midnight)

The result of the altitude and humidity for airplanes was:

  • threshold for tasting sweet is increased
    (i.e. need more sweetness to taste it)
  • threshold for tasting sour decreased
    (i.e. more sensitive to sour)
  • threshold for tasting bitter decreased
    (i.e. more sensitive to bitter)
  • threshold for tasting salt is increased
    (i.e. you need more salt)
  • umami was unchanged

This would explain why I always add salt to airplane food, why at one
point corn chips with vegemite was the best thing ever on playa and why
there’s this odd bacon obsession amongst so many.

This also explains why there may be a preference for less hoppy beers on
playa (or, if you’re me, a desire to try some insanely hopped ones to
see if I can notice an intensity increase).

Also, it’s one of the few places I can stomach US sodas (HFCS being
instant diabetes, but ginger ale on playa/in a plane is kinda nice).

One suggestion (and Heston tried this on flights) is nasal
douching… and I’m actually pretty keen to clean out the nose before
eating on playa because as we all know, playa up your nose is a fact of
life.

I’d be pretty interested to conduct some experiments both in normal
conditions and on playa with various concentrations of sweet, sour,
bitter, salty and umami and note down at what concentrations the flavor
is noticed.

I’m thinking:

  • sweet – sugar
  • sour – lemon juice? (store bought so it should be consistent)
  • bitter – not sure here, all i can think of is hops or Bitters
  • salty – salt :)
  • umami – liquid smoke?

Although it may require some experimentation to find what the minimum
concentrations are. Once we’ve worked these out, should be able to do
tasting and take notes when not on playa and then recreate it all on
playa and compare results.

Thoughts?

HOWTO: Build a Monorail

At linux.conf.au 2012 I gave a lightning talk on our Burning man 2010 art installation the Nowhere2Nowhere monorail. I finally extracted the video of just my lightning talk and threw it up on youtube for easy viewing:

popcon-historical: a tool for monitoring package popularity in debian/ubuntu

I’ve just uploaded (where ‘just’ is defined as “a little while ago”) popcon-historical to github. It’s a rather rudimentary way to look at the popcon data from Debian and Ubuntu over time. It loads all the data into a Drizzle database and then has a small perl web-app to generate graphs (and CSV).

Github: https://github.com/stewartsmith/popcon-historical

I’ve also put up a project page on it: https://flamingspork.com/popcon-historical/

An example graph is this one of Percona Toolkit vs Maatkit installs in Ubuntu over time:

You can actually get it to graph any package (which, unlike the graphs on debian.org, the package does not have to be in the Debian archive to graph it over time – it can be a package from third party repos).

“We open source it, and then developers show up and do work for free”

Those who have been around the free and open source software world long enough have heard “We open source it, and then developers show up and do work for free” at least once and have called bullshit on it at least once.

It turns out that people don’t go and work on software for free. They are either modifying software to scratch their own itch (in which case they’re getting 99+% of the code for nothing, so contributing a small bit back is the equivalent of paying for it – with their time rather than money) or it’s a good bit of fun.

So why do software projects that are dual licensed with a commercial license get fewer outside contributions? I think it’s quite simple: people don’t tend to spend their spare time making other people money while making none for themselves. Simply, these projects are left with only contributions from those being paid to work on it (usually by the company who sells the commercial license) and people/companies scratching an itch. Projects that aren’t dual licensed are more likely to have contributors from several companies as then it’s not all-but-one company spending time and money to make another company money.

Stewart’s dot twenty rule

I realised I haven’t written on this for a while and I was asked about it again today.

Stewart’s dot twenty rule is that a piece of software is never really mature until a dot twenty release.

This was a variant of “never use a dot zero release” which has been around the industry for a long time (i.e. always wait for X.0.1).

My first written observation on my variant on this rule was back in 2006:

This is a really stupid metric of software maturity. It is, however, disturbingly accurate.

It seems to continue to be both really stupid and disturbingly accurate. The first few point releases are still going to have rough edges and once you get to about 5 you likely have something that’s intensely usable for a good number of people, by dot 10 the more complex use cases should start to be okay and once you get to dot twenty, then you could say it’s mature.

A topic for another time is how releasing often is one thing but maintaining a release is quite another.

Previously:

An argument for popcon

There is a package called popularity-contest that’s available in both Debian and Ubuntu (and likely other Debian derivatives). It grabs the list of packages installed on the machine and submits it to the Debian or Ubuntu popularity contests.

There you can see which are the most popular packages in Debian and Ubuntu. Unsurprisingly, dpkg, the package manager is rather popular.

Why should you enable it? Looking at popcon results are solid numbers as to how many users you may have. Although the absolute numbers may not be too accurate, it’s a sample set and if you examine the results over time you can start to get an idea on if your software is growing in popularity or not.

But there’s something more than that, if you can prove that a lot of people are installing your software on Debian, then you’re likely going to be able to argue for more work time being spent on improving the packaging for Debian.

Quite simply, enabling popcon is a way to help people like me argue for more time being spent on making Debian better.

DevStack woes

DevStack is meant to be a three step “get me an openstack dev environment” thing. You’re meant to be able to grab a fresh installation of something like Ubuntu 12.04 or Fedora and “git clone $foo && cd devstack && ./stack.sh”, wait a while and then be able to launch instances.

This much does work.

What does not work is being able to ssh to those instances. The networking is completely and utterly broken. I have tried both Ubuntu and Fedora in a fresh VM (KVM, on an Ubuntu host) and have asked a variety of experts for help. No dice.

What I want to hear is a way to remotely get it going locally, in a VM.

At the moment I’m tempted to submit a pull request to the devstack website adding a 4th step of “muck around for a few days before giving up on ever being able to ssh into a launched instance as these instructions are wrong”.

Switching to Fedora from Ubuntu

I’ve run Ubuntu on my desktop (well… and laptop) since roughly the first release back in 2004. I’ve upgraded along the way, with reinstalls on the laptop limited to changing CPU architecture and switching full disk encryption.

Yesterday I wiped Ubuntu and installed Fedora.

Previously to Ubuntu I ran Debian. I actually ran Debian unstable on my desktop/laptop. I ran Debian Stable on any machines that had to sit there and “just work” and were largely headless. Back then Debian stable just didn’t have even remotely recent enough kernel, X and desktop apps to really be usable with any modern hardware. The downside to this was that having an IRC client open to #debian-devel and reading the topic to find if sid (codename for the unstable distribution) was pretty much a requirement if you ever thought about running “apt-get dist-upgrade”. This was exactly the kind of system that you wouldn’t give to non-expert family members as a desktop and expect them to maintain it.

Then along came Ubuntu. The basic premise was “a Debian derived distribution for the desktop, done right.” Brilliant. Absolutely amazingly brilliant. This is exactly what I wanted. I had a hope that I’d be able to focus on things other than “will dist-upgrade lead to 4 hours of fixing random things only to discover that X is fundamentally broken” and a possibility that I could actually recommend something to people who weren’t experts in the field.

For many years, Ubuntu served well. Frequent updates and relatively flawless upgrades between releases. A modern desktop environment, support for current hardware – heck, even non computer literate family members started applying their own security updates and even upgrading between versions!

Then, something started to go awry…. Maybe it was when Ubuntu shipped a kernel that helpfully erased the RAID superblock of the array in the MythTV machine… Maybe it was when I realized that Unity was failing as a basic window manager and that I swore less at fvwm…. Or maybe it was when I had a bug open for about 14,000 years on that if you left a Unity session logged in for too long all the icons in the dock would end up on top of each other at the top left of the screen making it rather obvious that nobody working on Ubuntu actually managed to stay logged in for more than a week. Or could it be that on the MythTV box and on my desktop having the login manager start (so you can actually log in to a graphical environment) is a complete crapshoot, with the MythTV box never doing it (even though it is enabled in upstart… trust me).

I think the final straw was the 13.04 upgrade. Absolutely nothing improved for me. If I ran Unity I got random graphics artifacts (a pulldown menu would remain on the screen) and with GNOME3 the unlock from screensaver screen was half corrupted and often just didn’t show – just type in your password and hope you’re typing it into the unlock screen and it hasn’t just pasted it into an IM or twitter or something. Oh, and the number of times I was prompted for my WiFi network password when it was saved in the keyring for AT LEAST TWO YEARS was roughly equivalent to the number of coffee beans in my morning espresso. The giant regressions in graphics further removed any trust I had that Mir may actually work when it becomes default(!?) in the next Ubuntu release.

GNOME3 is not perfect… I have to flip a few things in the tweak tool to have it not actively irritate me but on the whole there’s a number of things I quite like about it. It wins over Unity in an important respect: it actually functions as a window manager. A simple use case: scanning photos and then using GIMP to edit the result. I have a grand total of two applications open, one being the scanning software (a single window) and the other being the GIMP. At least half the time, when I switch back to the scanning program (i.e. it is the window at the front, maximized) I get GIMP toolbars on top of it. Seriously. It’s 2013 and we still can’t get this right?

So… I went and tried installing Fedora 19 (after ensuring I had an up to date backup).

The install went pretty smoothly, I cheated and found an external DVD drive and used a burnt DVD (this laptop doesn’t have an optical drive and I just CBF finding a suitably sized USB key and working out how to write the image to it correctly).

The installer booted… I then managed to rather easily decrypt my disk and set it to preserve /home and just format / and /boot (as XFS and ext3 respectively) and use the existing swap. Brilliant – I was hoping I wouldn’t have to format and restore from backup (a downside to using Maildir is that you end up with an awful lot of files). Install was flawless, didn’t take any longer than expected and I was able to reboot into a new Fedora environment. It actually worked.

I read somewhere that Fedora produces an initramfs that is rather specific to the hardware you’re currently running on, which just fills me with dread for my next hardware upgrade. I remember switching hard disks from one Windows 98 machine to another and it WAS NOT FUN. I hope we haven’t made 2013 the year of Windows 98 emulation, because I lived through that without ever running the damn thing at home and I don’t want to repeat it.

Some preferences had to be set again, there’s probably some incompatibility between how Ubuntu does things and how Fedora does things. I’m not too fussed about that though.

I did have to go and delete every sign of Google accounts in GNOME Online Accounts as it kept asking for a password (it turns out that two-factor-auth on Google accounts doesn’t play too nice). To be fair, this never worked in Ubuntu anyway.

In getting email going, I had to manually configure postfix (casually annoying to have to do it again) and procmail was actually a real pain. Why? SELinux. It turns out I needed to run “restorecon -r /home”. The way it would fail was silently and without any error anywhere. If I did “setenforce 0” it would magically work, but I actually would like to run with SELinux: better security is better. It seems that the restorecon step is absolutely essential if you’re bringing across an existing partition.

Getting tor, polipo and spamassasin going was fairly easy. I recompiled notmuch, tweaked my .emacs and I had email back too. Unfortunately, it appears that Chromium is not packaged for Fedora (well.. somebody has an archive, but the packages don’t appear to be GPG signed, so I’m not going to do that). There’s a complaint that Chromium is hard to package blah blah blah but if Debian and Ubuntu manage it, surely Fedora can. I use different browsers for different jobs and although I can use multiple instances of Firefox, it doesn’t show up as different instances in alt-tab menu, which is just annoying.

It appears that the version of OTR is old, so I filed a bug for that (and haven’t yet had the time to build+package libotr 4.0.0 – but it’s sorely needed). The pytrainer app that is used to look at the results of my Garmin watch was missing some depedencies (bug filed) and I haven’t yet tried to get the Garmin watch to sync… but that shouldn’t be too hard…

The speakers on my laptop still don’t work – so it’s somebody screwing up either the kernel driver or pulseaudio that makes the speakers only sometimes work for a few seconds and then stop working (while the headphone port works fine).

On the whole, I’m currently pretty happy with it. We’ll see how the upgrade to Fedora 20 goes though…. It’s nice using a desktop environment that’s actually supported by my distribution and that actually remotely works.

An old note on the Storage Engine API

Whenever I stick my head into the MySQL storage engine API, I’m reminded of a MySQL User Conference from several years ago now.

Specifically, I’m reminded of a slide from an early talk at the MySQL User Conference by Paul McCullagh describing developing PBXT. For “How to write a Storage Engine for MySQL”, it went something like this:

  1. Develop basic INSERT (write_row) support – INSERT INTO t1 VALUES (42)
  2. Develop full table scan (rnd_init, rnd_next, rnd_end)  – SELECT * from t1
  3. If you’re sane, stop here.

A lot of people stop at step 3. It’s a really good place to stop too. It avoids most of the tricky parts that are unexpected, undocumented and unlogical (yes, I’m inventing words here).

MySQL vs Drizzle plugin APIs

There’s a big difference in how plugins are treated in MySQL and how they are treated in Drizzle. The MySQL way has been to create a C API in front of the C++-like (I call it C- as it manages to take the worst of both worlds) internal “API”. The Drizzle way is to have plugins be first class citizens and use exactly the same API as if they were inside the server.

This means that MySQL attempts to maintain API stability. This isn’t something worth trying for. Any plugin that isn’t trivial quickly surpasses what is exposed via the C API and has to work around it, or, it’s a storage engine and instead you have this horrible mash of C and C++. The byproduct of this is that no core server features are being re-implemented as plugins. This means the API is being developed in a vacuum devoid of usefulness. At least, this was the case… The authentication plugin API seems to be an exception, and it’s interesting to note that semisync replication is in fact a plugin.

So times may be changing… sort of. Yesterday I noted that some storage engine API features are only available if you’re InnoDB and I’ve voiced my general disappointment in the audit API being unsuitable to implement various forms of query logging already in the server (general query log, slow query log).

One thing to note: when the API is the same for both inside the server and a plugin, it makes initial refactoring very easy, and you quickly see the bits that could be improved.

Some storage engine features you only get if you’re InnoDB

I had reason to look into the extended secondary index code in MariaDB and MySQL recently, and there was one bit that I really didn’t like.

MariaDB:

share->set_use_ext_keys_flag(legacy_db_type == DB_TYPE_INNODB);

MySQL:

use_extended_sk= (legacy_db_type == DB_TYPE_INNODB);

In case you were wondering what “legacy_db_type” actually does, let me tell you: it’s not legacy at all, it’s kind of key to how the whole “metadata” system in MySQL works. For example, to drop a table, this magic number is used to work out what storage engine to call to drop the table.

Now, these code snippets basically kiss goodbye to the idea of a “pluggable storage engine” architecture. If you’re not InnoDB, you don’t get to have certain features. This isn’t exactly MySQL or MariaDB encouraging an open storage engine ecosystem (quite the opposite really).

Having the MySQL server have this incredibly basic, busy and incomplete understanding of metadata has always been a bit of a mess. The code for reading a table definition out of the FRM file really does show its age, and has fingers all through the server.

If somebody was serious about refactoring server code, you’d certainly be looking here, as this code is a major source of arbitrary limitations. However, if you have the server and the engine(s) both having separate views of what is the “correct” state of metadata you end up with a mess (anyone who has had InnoDB be out of sync with FRMs knows this one). I worry that the FRM code will be replaced with something even less understandable by humans, again making the mistake that the server knows the state of the engine better than the engine does.

See Also:

New libeatmydata release

Good news everyone! There’s a new libeatmydata release! I’ve put a source tarball up on the launchpad page: release-79.

This version packs:

  • RPM and debian packaging in tree
  • A bug fix so that O_SYNC and O_DSYNC are properly discarded on 32bit machines both with and without _FILE_OFFSET_BITS being set.

I’d love to hear any feedback and receive any patches (hopefully things still work well on MacOS X and Solaris). So far, libeatmydata has had contributions from the following people, and many thanks to them:

  • Stewart Smith
  • Alexey Bychko
  • Blair Zajac
  • Phillip Susi
  • Modestas Vainius
  • Monty Taylor
  • Olly Betts
  • Pavel Pushkarev
  • Elliot Murphy
  • Eric Wong
  • Tamas TEVESZ
  • Joachim Berdal Haga
  • Mohsen Hariri

The EXAMPLE storage engine

The Example storage engine is meant to serve mainly as a code example of the stub of a storage engine for example purposes only (or so the code comment at the start of ha_example.cc reads). In reality however, it’s not very useful. It likely was back in 2004 when it could be used as a starting point for starting some simple new engines (my guess would be that more than a few of the simpler engines started from ha_example.cc).

The sad reality is the complexity of the non-obviousness of the bits o the storage engine API you actually care about are documented in ha_ndbcluster.cc, ha_myisam.cc and ha_innodb.cc. If you’re doing something that isn’t already done by one of those three engines: good luck.

Whenever I looked at ha_example.cc I always wished there was something more behind it… basically hoping that InnoDB would get a better and cleaner API with the server and would use that rather than the layering violations it has to do the interesting stuff.

That all being said, as a starting point, it probably helped spawn at least a dozen storage engines.

The ARCHIVE Storage Engine

I wonder how much longer the ARCHIVE storage engine is going to ship with MySQL…. I think I’m the last person to actually fix a bug in it, and that was, well, a good number of years ago now. It was created to solve a simple problem: write once read hardly ever. Useful for logs and the like. A zlib stream of rows in a file.

You can actually easily beat ARCHIVE for INSERT speed with a non-indexed MyISAM table, and with things like TokuDB around you can probably get pretty close to compression while at the same time having these things known as “indexes”.

ARCHIVE for a long time held this niche though and was widely and quietly used (and likely still is). It has the great benefit of being fairly lightweight – it’s only about 2500 lines of code (1130 if you exclude azio.c, the slightly modified gzio.c from zlib).

It also use the table discovery mechanism that NDB uses. If you remove the FRM file for an ARCHIVE table, the ARCHIVE storage engine will extract the copy it keeps to replace it. You can also do consistent backups with ARCHIVE as it’s an append-only engine. The ARCHIVE engine was certainly the simplest example code of this and a few other storage engine API things.

I’d love to see someone compare storage space and performance of ARCHIVE against TokuDB and InnoDB (hint hint, the Internet should solve this for me).

The MySQL Cluster storage engine

This is one close to my heart. I’ve recently written on other storage engines: Where are they now: MySQL Storage Engines, The MERGE storage engine: not dead, just resting…. or forgotten and The MEMORY storage engine. Today, it’s the turn of MySQL Cluster.

Like InnoDB, MySQL Cluster started outside of MySQL. Those of you paying attention at home may notice a correlation between storage engines not written exclusively for MySQL and being at all successful.

NDB (for Network DataBase) started inside Ericsson, originally written in a language called PLEX, which was internal to Ericsson and used in the AXE telephone switches. Mikael Ronstrom’s PHD thesis covered NDB and even covered things that (at least were) yet to be implemented (it’s been quite a few years since I leafed through it last). The project at Ericsson (IIRC) was shelved a couple of times, but eventually got spun out into an Ericsson Business Innovation company called Alzato.

Some remnants of PLEX can still be found in the NDB source code (if you look really hard that is). At some point the code was fed through a PLEX to C++ converter and development continued from there. Some of the really, really old parts of the source may seem weird either due to this or some hand optimization for SPARC processors in the 1990s.

In 2003, MySQL AB acquired Alzato and work on a storage engine plugin for MySQL to interface to the (C++ API only) NDB was underway. Seeing as the storage engine interface was so simple, easy and modular it would only take several years for the interface to NDB to become mature.

The biggest problem: NDB itself worked really well if your workload fit exactly what it was good at… if you deviated, horrific performance and/or crashes were not as uncommon as we’d have liked. This was a source of strain for many years with the developers and support team on one side and some of the less-than-careful sales team on the other. That being said, there have been some absolutely awesome sales people selling NDB into markets it truly fits, and this is why there’s barely a place in the world where placing a mobile phone call doesn’t go through MySQL Cluster at some point.

You should read Tomas Ulin’s post Celebrating 10 years @MySQL for a bit of an insight into how Alzato became part of MySQL AB (which later became part of Sun which became part of Oracle).

I joined the MySQL Cluster team at MySQL in December 2004, not too long after Alzato was acquired, but certainly when the NDB storage engine in MySQL 4.1 was in its very early stages – it was then by no means a general purpose database.

Over the years, MySQL Cluster gained both traction and features, making it useful for more applications. One of the biggest marketing successes of MySQL was the storage engine architecture and how you could just “plug in” different engines. The reality (of course) was far different and even though MySQL Cluster did just “plug in” to MySQL, it was certainly not a drop in replacement.

In MySQL 5.0, a bunch of neat new features were added:

  • Engine condition pushdown
    This enabled conditions on non-indexed columns to be evaluated on the data nodes rather than having every row pulled up to the SQL node to be evaluated.
  • Batched read interface
    So that queries like SELECT FOO FROM BAR WHERE A IN (1,2,3) were executed as a single network round trip rather than 3 round trips.
  • Query cache
    Although the query cache should die, hey, at least it worked with NDB now…. in a way.
  • Reduced IndexMemory usage
    Remember, NDB is an in-memory database, so saving a bunch of bytes for secondary indexes was a big thing.

the first release with things I really worked on was MySQL 5.1. My first talk (to a packed room) at the MySQL User Conference in 2006 was on new features in MySQL Cluster 5.1. I’m still quite proud of that talk even though I know I am a much better speaker than I was then (It would have been great to have had more guidance… but hey, learning from experience is good too).

We added a lot in 5.1:

  • Integration with replication
    This is where row based replication was born. It was a real team effort with the NDB kernel part (going from memory and bzr logs) having been written by Tomas and Jonas seems to have a bunch of code there too. I worked a bunch on the NDB Injector thread in mysqld, Mats worked on the core row based code (at the time the most C++ like code in the entire MySQL world). You could now have a cluster replicate to another cluster with the giant bottleneck that is MySQL replication.
  • disk data
    You could store non-indexed columns on disk. I implemented the INFORMATION_SCHEMA.FILES table for this, I was young and naive enough to think that the InnoDB guys would also fill out this table and all would be happy with the world (I’m lucky I haven’t been holding my breath on this one).
  • Variable Sized columns
    A VARCHAR(255) would actually not always use more than 255bytes if you just stored a single character in it. Catch? Only for in-memory columns.
  • User defined partitioning
    Because NDB desperately needed more options, we let the user choose how they wanted to partition up their data (per table).
  • Autodiscovery of schema changes
    This was a giant workaround to the epic mess that is FRM files and data dictionary things inside the MySQL Server. It is because of all this code that when I went to rewrite the whole thing for Drizzle I took the approach of “just pass it down to the engines, the server must not attempt to know better”. FWIW, I’m still right: if the server tries to be clever you now have two places for bugs to be, not just one.
  • Distribution awareness
    i.e. better selection of which data node to talk to for a particular query, reducing latency.
  • Online add/drop index.
    How long did it take for other engines to get this? Let’s not think about that :)

After that the really interesting stuff started to happen, that is, the first major fork of MySQL: MySQL Cluster Carrier Grade Edition (CGE). Why? We had customers that simply couldn’t wait for MySQL 6.0 (after all, they’d still be waiting).

We had MySQL Cluster CGE 6.1, 6.2, 6.3 and now we’re into 7.0, 7.1 and 7.2. There is without doubt that it’s the longest serving and surviving MySQL fork. There were non-trivial changes inside the MySQL server too, which caused enough of a merge problem for the (small) Cluster team.

One big thing that you’re probably still all waiting for? Replication conflict detection and resolution in circular/multi-master replication setups. It was an NDB first and been used in production for a decent amount of time.

I remember a hack while on an airplane led to the CompressedBackup and CompressedLCP options (used zlib when writing out checkpoints/backups) – something that took more time than you’d think to go from prototype to production ready code.

The last few things I worked on in MySQL Cluster before going and working full time on Drizzle was the Windows port, online add/drop node and NDBINFO.

I’ve left out so many cool MySQL Cluster things that were worked on over the years (e.g. online add/drop column, rewriting of LCP code, micro GCPs, crash-safe DDL, the test suite). I really should mention the test suite, in lines of code it was over three times that of MyISAM.. and that was probably six years ago that I worked that out.

One thing to think about: when Innobase Oy was bought by Oracle and there was this effort to have a transactional storage engine that was inside MySQL AB rather than another company, I pointed out that I thought it would take less time adding the needed features to NDB and integrating it inside the MySQL server binary (and with the addition of online add node you could go from stand alone DB server to a full cluster with no down time) than it would for any of the alternatives to get to a suitable level of maturity.

I wish I put money on this… I put money on the MySQL 5.1 GA release date (which I was happy to loose), but in the years since you can see that InnoDB is still reigning supreme with all that came to replace it having fallen away for one reason or another. It’s still on track to have MySQL Cluster be the only real alternative (now also, funnily enough, owned by Oracle). I have to say, it’s kind of a hollow victory though, it would have been nice to see Falcon and PBXT be serious players in today’s market.