Other MySQL branch code sizes

Continuing on from my previous posts, MySQL code size over releases and MariaDB code size I’ve decided to also look into some other code branches. I’ve used the same methodology as my previous few posts: sloccount for C and C++ code only.

There are also other branches around in pretty widespread use (if only within a single company). I grabbed the Google, Facebook and Twitter patches and examined them too, along with Percona Server 5.1 and 5.5.

Codebase LoC (C, C++) +/- from MySQL
Google v4 patch 5.0.37 970,110 +26,378 (from MySQL 5.0.37)
MySQL@Facebook 1,087,715 +15,768 (from MySQL 5.1.52)
Twitter 5.5.29.t10 1,192,718 +3,624
Percona Server 5.1 trunk 1,066,418 +14,878 (from MySQL 5.1.66)
Percona Server 5.5 trunk 1,208,577 +19,483 (from MySQL 5.5.29) +142,159 (from PS 5.1)
Drizzle trunk 334,810

The Google patch has always had a reputation of being large, and with an extra 26kLOC of code, it certainly is the biggest of any of the more current branches – and that’s actually a surprise to me that it adds this much code.

The Facebook and Percona Server 5.1 branches are amazingly similar in how much extra code they add, and they’re not carbon copies of each other. The Twitter patch quite notable for how little extra code it adds.

For giggles, I included Drizzle – which is (even with all the plugins) less than a third of the size of MySQL 5.1.

It’s clear that the Percona Server and Facebook patches introduce much less code than MariaDB does, which does go with the general wisdom of them being closer to Oracle MySQL than MariaDB is.

If we look at Percona Server, we see that with Percona Server 5.5 there is indeed a bunch more code than was in Percona Server 5.1, with roughly 5,000 more lines of code than we’d expect from a simple port from MySQL 5.1 to MySQL 5.5. This feels about right, we’ve added new things to Percona Server 5.5 that weren’t in Percona Server 5.1.

11 thoughts on “Other MySQL branch code sizes

  1. I think there might be something wrong with the way lines are counted:

    $ git diff mysql-5.5.29..mysql-5.5.29.t10 –stat
    429 files changed, 23707 insertions(+), 1766 deletions(-)

    $ git diff mysql-5.5.29..mysql-5.5.29.t10 –stat –relative sql/ include/ client/ mysys/ storage/ unittest/
    142 files changed, 7110 insertions(+), 821 deletions(-)

  2. That’s because I’m not counting diff size, I’m doing differences between sloccount totals. This means that these counts are more a “hey, what did they add” rather than “went and changed code to fix bugs”.

    I plan to look at diff size in the not too distant future, as it likely tells a very different story.

  3. It might be worth updating the post to reflect that. Also, if I understand correctly, it’s “hey, what new lines they added”. Which, in my humble opinion, is meaningless because if you add 10 lines to a file, but later remove 10, the count will be zero using this methodology.

  4. Yep.. it’s flawed. So is diffstat too though, as if you just ran indent over everything you’d look like you completely rewrote the thing :)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.