MySQL Event Scheduler status

Over in the MySQL Manual section on Event Scheduler Status you get to find out that in MySQL 5.1.11 you got to find out the status of the event scheduler by using the SHOW SCHEDULER STATUS query. Any version since then you need SUPER and have to use mysqladmin debug to get any information on the status of the event scheduler itself.

INNODB_SYS_FIELDS vs INNODB_SYS_COLUMNS

In MySQL 5.6 we have two new INFORMATION_SCHEMA tables for InnoDB that are likely going to cause confusion: INNODB_SYS_FIELDS and INNODB_SYS_COLUMNS. You may think these are likely to just be aliases of each other in order to make your life easier. However…

These are not the same thing. The INNODB_SYS_FIELDS table is all about key columns (fields) of InnoDB indexes, while INNODB_SYS_COLUMNS is about actual columns. This is even more confusing as within the MySQL source code, there is the Field set of objects that manipulate fields (columns) in a row.

Blegh. I’m glad it’s Friday.

diffstat of MySQL 5.6 versus 5.5

Yesterday I wrote about what the diffstat between MySQL 5.5 and MariaDB 5.5 was, and previously to that, about the MariaDB code size as reported by sloccount. Let’s look at MySQL 5.6.

A naive wc based “lines of code” for MySQL 5.6 sql/ directory is ~490kLOC which contasts with MySQL 5.5 being ~375kLOC by the same measure. If we diffstat the sql/ directory like I did for MariaDB 5.5 we get:

 357 files changed, 172871 insertions(+), 67922 deletions(-)

Versus, as you remember from yesterday for MariaDB 5.5 over MySQL 5.5:

 250 files changed, 83639 insertions(+), 23090 deletions(-)

The MySQL 5.5 to 5.6 sql/ changes line up with What I found in my post MySQL modularity, are we there yet? in that the core server code for MySQL has grown by about 100,000 lines of code.

The jump from MySQL 5.5 to MariaDB 5.5 is a smaller one than jumping from MySQL 5.5 to MySQL 5.6, at least in terms of changed server code.

A judgement all on if a smaller diff is a safer jump or not will rest more with the quality of that code more than anything else. As we’ve seen previously, modularity isn’t coming to the MySQL code base any time soon.

So what about the diffstat of MariaDB compared to MySQL?

So, I’ve looked at what sloccount says on the differences between Oracle MySQL over versions of itself and the various MySQL branches around. What I haven’t looked at is the diffstat. Firstly, let’s look at MariaDB.

I’m going to look at MariaDB 5.5.29 as compared to MySQL 5.5.29, both checked out from bzr. A naive diffstat would give us:

 5261 files changed, 1086165 insertions(+), 122751 deletions(-)

And this looks like an awful lot of code that has changed: about 1,086,165 lines! This actually includes a whole other copy of InnoDB in the form of XtraDB. If we take that into account we get:

 5032 files changed, 864997 insertions(+), 125099 deletions(-)

Which is still incredibly high. Let’s look at what’s changed though. We actually see a bunch of changes in the test suite, some of which are relatively harmless, while others, like the change to rpl_tests/rpl_innodb.test have a “–replace_result MyISAM InnoDB” line added to them, which is awfully odd (possibly legitimate, but it stuck out).

In the end, I came up with this diff command which I think leaves us with a best diff for what is the code difference between MySQL 5.5 and MariaDB 5.5:

 diff -Nru --exclude=BUILD* --exclude=.bzr* --exclude debian* \
--exclude=man* --exclude=mysql-test* --exclude=win* \
--exclude=unittest* --exclude=test* \
--exclude=support-files* --exclude=README \
--exclude=Docs --exclude=CMakeLists.txt \
--exclude=COPYING.LESSER --exclude=INSTALL* \
--exclude=KNOWN_BUGS.txt \
--exclude=cmake* mysql-5.5.29/ mariadb-5.5.29/

This is not to discount the build and test changes that MariaDB have made, but in this case I feel they distort the numbers a bit and I’ve previously just been counting C and C++ code, so it’s probably fairer this way.

We end up with a diffstat of:

 1156 files changed, 326081 insertions(+), 42751 deletions(-)

If we then exclude the copyright notice changes and any whitespace by changing the start of the diff command to this:

diff -NruiEbwB --ignore-matching-lines='Copyright.*Monty' \
--ignore-matching-lines='Copyright.*Oracle'

We end up with a diffstat of:

 1129 files changed, 322821 insertions(+), 39588 deletions(-)

Which is a little different to what I found in my previous post (MariaDB code size) that just used sloccount. There we found that MariaDB 5.5 was 187,000 more lines of code than MySQL 5.5 while here we find the difference to be 283,000 lines of code. I suspect these differences to be in how diff and sloccount count things. If you do a naive count of the number of lines in source files in the sql/ directory you get 375kLOC while sloccount says 256kLOC.

There is still some noise in this number as there’s some Copyright notices for some of the strings code that changes, but this doesn’t seem to be too much. What about server code though? If we just diffstat the sql/ directory (core server code), then we get:

 250 files changed, 83639 insertions(+), 23090 deletions(-)

Which is still nothing to sneeze at, sloccount tells me that MySQL 5.5.29 only has 256kLOC in the sql/ directory to begin with and a naive wc count to be about 375kLOC.

Which is bigger: MySQL or PostgreSQL?

From my previous posts, we have some numbers (excluding NDB) for the size of MySQL, so what about PostgreSQL? Here, I used PostgreSQL git trunk and classing things in the contrib/ directory as plugins. I put the number of lines of code in the src/backend/storage directory down as storage engines LoC but did not count it as non-kernel code.

Version Total LoC Plugin LoC Storage Engines LoC Remaining (kernel)
MySQL 5.5.30 858,441 2,706 171,009 684,726 (79% kernel)
MySQL 5.6.10 1,049,344 29,122 236,067 784,155 (74% kernel)
MariaDB 5.5 1,142,118 11,781 304,015 826,322 (72% kernel)
Drizzle trunk 334,810 31,150 130,727 172,933 (51% kernel)
PostgreSQL trunk 648,691 61,934 17,802 586,757 (90% kernel)

What we can see is that the PostgreSQL kernel size is actually smaller than any recent MySQL version (5.1 was slightly smaller). This is rather interesting as it is generally thought that PostgreSQL does more than MySQL. What’s more telling is that total code size, PostgreSQL is about half of MySQL 5.6 or MariaDB 5.5. Only Drizzle ends up being smaller, which makes sense as it “does less”.

Is MySQL bigger than Linux?

I’m going to take the numbers from my previous post, MySQL Modularity, Are We There Yet? for the “kernel” size of MySQL – that is, everything that isn’t a plugin or storage engine.

For Linux kernel, I’m just going to use the a-bit-old git tree I have on my laptop. I’ve decided that the following directories are for “plugins” drivers/ arch/ sound/ firmware/ crypto/ usr/ virt/ tools/ scripts/ fs/*/* and everything else is core kernel code.

Version Total LoC Total Plugin LoC Remaining (kernel)
MySQL 5.6.10 1,049,344 265,189 784,155 (74% kernel)
MariaDB 5.5 1,142,118 315,796 826,322 (72% kernel)
Linux 9,983,269 8,824,121 1,159,148 (11% kernel)

The scary thing is that it’s surprisingly close, MySQL/MariaDB core is roughly 68-71% the  size of the Linux kernel. This is probably an unfairly large number for Linux too as there’s much more of Linux that is pluggable and modular… so I actually suspect they’re closer to exactly the same size.

If we look at the net/ directory in linux, it’s a grand total of 493,000 lines of code, all of which is fairly modular and independent. You could, quite reasonably, claim that the core of Linux is in fact closer to half a million lines of code than a million, making MySQL significantly larger.

So how many engineers are looking after each code base? We know there are over a thousand Linux kernel developers contributing to each release (e.g. https://lwn.net/Articles/395961/ for data back in 2010, and https://lwn.net/Articles/537110/ for Feb 2013).

I’m now going to fudge some things to attempt to work out how many “developers” are working on linux core code rather than drivers and arch specific things. I work out there’s probably about 20-25% of linux developers who work on things that are not drivers, filesystems or arch code. This is around 250-300 developers for each kernel release.

So… how many people have ever committed code to MySQL? This is fairly easy to find out: I simply looked at the entire bzr history, grepped out every committer and then uniqued the list (this required more than just sort -u as people used different email addresses and names). How many people have ever committed code to MySQL (i.e. their code can be found in the MySQL 5.6 bzr tree)? 312.

How many committers to MySQL 5.6 are there? 161. This is pretty amazing, that’s about half of what the total is. However, this number is misleading. For example, my name is there and the last commit to the MySQL tree from me was in 2008. You also see names such as Monty Taylor and Kristian Nielsen – all three of us not having worked for MySQL/Sun/Oracle for a great number of years. At the very least, there’s been a lot of code integration into MySQL 5.6 from many existing sources that were not previously in MySQL trunk.

MySQL modularity, are we there yet?

MySQL is now over four times the size than it was with MySQL 3.23. This has not come in the shape of plugins.

Have we improved modularity over time? I decided to take LoC count for plugins and storage engines (in the case of Drizzle, memory, myisam and innobase are storage engines and everything else comes under plugin). I’ve excluded NDB from these numbers as it is rather massive and is pretty much still a separate thing.

Version Total LoC Plugin LoC Storage Engines LoC Remaining (kernel)
MySQL 3.23.58 371,987 0 (0%) 176,276 195,711 (52% kernel)
MySQL 5.1.68 721,331 228 237,124 483,979 (67% kernel)
MySQL 5.5.30 858,441 2,706 171,009 684,726 (79% kernel)
MySQL 5.6.10 1,049,344 29,122 236,067 784,155 (74% kernel)
MariaDB 5.5 1,142,118 11,781 304,015 826,322 (72% kernel)
Drizzle trunk 334,810 31,150 130,727 172,933 (51% kernel)

I’ve used the non-plugin and non-storage engine code size to be the database “kernel” – i.e. the core of the database server.

What I find really interesting here is that yes, the amount of code that is to some degree modular has increased. The amount of code that is a MySQL plugin is still very small compared to the server size

Drizzle is 20-25% of the size of a modern MySQL or MariaDB server and for many applications does largely or exactly the same thing.

Other MySQL branch code sizes

Continuing on from my previous posts, MySQL code size over releases and MariaDB code size I’ve decided to also look into some other code branches. I’ve used the same methodology as my previous few posts: sloccount for C and C++ code only.

There are also other branches around in pretty widespread use (if only within a single company). I grabbed the Google, Facebook and Twitter patches and examined them too, along with Percona Server 5.1 and 5.5.

Codebase LoC (C, C++) +/- from MySQL
Google v4 patch 5.0.37 970,110 +26,378 (from MySQL 5.0.37)
MySQL@Facebook 1,087,715 +15,768 (from MySQL 5.1.52)
Twitter 5.5.29.t10 1,192,718 +3,624
Percona Server 5.1 trunk 1,066,418 +14,878 (from MySQL 5.1.66)
Percona Server 5.5 trunk 1,208,577 +19,483 (from MySQL 5.5.29) +142,159 (from PS 5.1)
Drizzle trunk 334,810

The Google patch has always had a reputation of being large, and with an extra 26kLOC of code, it certainly is the biggest of any of the more current branches – and that’s actually a surprise to me that it adds this much code.

The Facebook and Percona Server 5.1 branches are amazingly similar in how much extra code they add, and they’re not carbon copies of each other. The Twitter patch quite notable for how little extra code it adds.

For giggles, I included Drizzle – which is (even with all the plugins) less than a third of the size of MySQL 5.1.

It’s clear that the Percona Server and Facebook patches introduce much less code than MariaDB does, which does go with the general wisdom of them being closer to Oracle MySQL than MariaDB is.

If we look at Percona Server, we see that with Percona Server 5.5 there is indeed a bunch more code than was in Percona Server 5.1, with roughly 5,000 more lines of code than we’d expect from a simple port from MySQL 5.1 to MySQL 5.5. This feels about right, we’ve added new things to Percona Server 5.5 that weren’t in Percona Server 5.1.

MariaDB code size

Continuing on from my previous post, MySQL code size over releases.

I wanted to look at the different branches/patch sets of MySQL out there and work out how far from upstream they deviated. I’m just going to compare against whatever upstream version the most easily accessible version is based on (be it 5.0.x, 5.1.x or whatever).

For MariaDB versions, I removed innodb_plugin and replaced it with xtradb for stats purposes as the MariaDB innodb_plugin is essentially the same as upstream and I don’t want to artificially inflate the diff size.

The first three major versions of MariaDB were all based on MySQL 5.1. I used sloccount and only counted C and C++ code.

So, let’s look at some of the MySQL patch sets/branches that are around. Firstly, let’s look at MariaDB:

Codebase LoC (C, C++) +/- from MySQL +/- from prev maj Version
MariaDB 5.1 1,210,168 +157,532 0
MariaDB 5.2 1,227,434 +174,798 +17,266 (since MariaDB 5.1)
MariaDB 5.3 1,264,995 +212,359 +37,561 (since MariaDB 5.2)
MariaDB 5.5 1,377,405 +187,658 (from MySQL 5.5) +112,410 (since MariaDB 5.3)

From my previous post on lines of code in MySQL versions, we learned that with MySQL 5.6 we saw a 354kLOC increase over MySQL 5.5. What is quite surprising is how close some of the MariaDB differences are to this. With MariaDB 5.5, we’re looking at a 187kLOC difference, which is roughly two thirds that of MySQL 5.6. What’s also interesting is that each incremental MariaDB release has not added nearly as much code as the MySQL 5.1 to 5.5 and 5.5 to 5.6 jumps did.

MariaDB LoC over major versions

The MariaDB code size has also been increasing, if we look at the graph above  you can really see the jump in code size over the past few releases.

If we look at the delta between MariaDB and MySQL, the first MariaDB release (MariaDB 5.1) was certainly a large jump. Each incremental MariaDB release (5.2 and 5.3) have been a smaller delta than the initial one. With MariaDB 5.5 we actually decrease the delta from MySQL, which is something that’s interesting to look at.

If we were going a straight port of MariaDB 5.3 to be based off MySQL 5.5, we’d expect the delta to be around 137kLOC (what MySQL 5.1 to 5.5 is) but it isn’t. The difference to MariaDB 5.5 from MariaDB 5.3 is only ~112kLOC, and the on the whole delta decreases.

But what makes up this big initial jump for MariaDB? Let’s look at some of the MariaDB 5.1 only modules and what’s left:

MariaDB 5.1 component LoC (MariaDB 5.1)
PBXT 45,107
FederatedX 3,076
IBM DB2i 13,486
Total 61,669
Other 95,863

So the MariaDB delta is not increase just because they included some existing modules, there’s more code in there, about as much as any major MySQL version bump.

Tomorrow we look at other MySQL branches, and we see that the MariaDB delta truly is significantly larger than any other MySQL branch.

MySQL code size over releases

As the start of a bit of a delve into the various MySQL branches and patch sets that have been around, let’s start looking at the history of MySQL itself. This is how big MySQL has been over all of the major releases since the beginning (where beginning=3.23). (edit: These numbers were all gathered using sloccount and only counting C++ and C source files.)

Codebase LoC (C, C++) +/- from previous MySQL
MySQL 3.23.58 371,987 0
MySQL 4.0.30 368,695 -3,292 (from MySQL 3.23)
MySQL 4.1.24 859,572 +490,877 (from MySQL 4.0)
+174,352 excluding NDB
MySQL 5.0.96 916,667 +57,095 (from MySQL 4.1)
MySQL 5.1.68 1,052,636 +135,969 (from MySQL 5.0)
MySQL 5.5.30 1,189,747 +137,111 (from MySQL 5.1)
MySQL 5.6.10 1,544,202 +354,455 (from MySQL 5.5)
increase in MySQL source code size over version

MySQL code size over major versions

Note the sharp increase in 5.6

LoC Delta for major MySQL release

We can see that MySQL has had some interesting code size changes over time, the big jump in 4.1 over 4.0 was mostly due to the introduction of MySQL Cluster, but even so, it was a big jump.

MySQL 5.6 is the largest MySQL code size increase in a MySQL version ever. The last time we saw anything like this was with the merging of MySQL Cluster in 4.1. At the very least, Oracle is paying people to write lines of code to extent that nobody has before.

unireg.h is finally gone

I got rid of unireg.cc way back in 2009 as I rewrote all the FRM related code inside Drizzle to instead use a nice protobuf based structure. If you’re wondering what was there, I just quote this part of pack_screens() from unireg.cc in MySQL 5.6:

start_row=4; end_row=22; cols=80; fields_on_screen=end_row+1-start_row;

We have gradually pulled things out of unireg.h over the years too. But, let’s go back to ask the question “What is UNIREG?”. To answer that, I’m going to quote from something that was current back when MySQL 3.22 was the latest and greatest:

In 1979, he developed an in-house database tool called UNIREG for managing databases. Since 1979, UNIREG has been rewritten in several different languages and extended to handle big databases.

No doubt the definition of big has changed for most people since then. If we take what the MySQL 3.23 manual said:

Unireg is our tty interface builder, but it uses a low level connection to our ISAM (which is used by MySQL) and because of this it is very quick. It has existed since 1979 (on Unix in C since ~1986).

This is where the FRM file comes from, and likely where unireg.cc and unireg.h in MySQL  originated from. Since we didn’t want to have the limitations of FRM files in Drizzle, and since all code that mentioned UNIREG was generally not modern, we’ve been gradually removing bits or refactoring it since.

So, what was in unireg.h to begin with? I peeked into the MySQl 5.6 unireg.h to remind myself:

  • defines for libc5
  • defines for the MySQL thing that isn’t gettext
  • the PLUGINDIR define
  • some macros for errors
  • a bunch of SPECIAL_ prefixed defines for a bitfield which is a truly odd thing.
  • store_record(), restore_record(), cmp_record(), empty_record() macros
  • defines for flags to openfrm(), openprt() and openfrd() (the latter two not being present in MySQL)
  • defines used in relation to INFORMATION_SCHEMA tables and triggers
  • the BIN_LOG_HEADER_SIZE define
  • the DEFAULT_KEY_CACHE_NAME define

Soo… really a whole bunch of stuff that should never have been put there in the first place as most of that has clearly come in after it was MySQL.

But I’m getting sidetracked, the main point was that I looked at what was remaining in unireg.h inside Drizzle and it was just one thing: Code to abort the server in certain odd situations (that we’d completely rewritten so had no relation to unireg). So, I renamed it.

Renaming a file is a pretty minor change, but that hides all the work leading up to that point so that now I can go and explain Drizzle internals while making one less reference to UNIREG.

Now to go and remove the last tiny bit of unireg_check…