MySQL 5.7 on POWER

In a previous post, I covered porting MySQL 5.6 to POWER and subsequently, some new record performance numbers with MySQL 5.6.17 on POWER8.

Well, those following at home will be aware that not only is the next sentence sponsored by IBM Legal, but that MySQL 5.7 alleviates a bunch of the mutex contention that we saw with MySQL 5.6. The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions.

In looking at MySQL performance on POWER, it’s inevitable that I should look at MySQL 5.7 and what’s coming up in the next stable release of MySQL.

Surprisingly, a bunch of the core code in InnoDB and MySQL dealing with mutexes has changed in MySQL 5.7 when compared to MySQL 5.6. Enough that I actually had to post a few bug reports about the changes that apply to any CPU architecture:

  • Bug 72805: mutex_delay() creating excess memory traffic, GCC mem barrier needed
    • This is now more generic mutex code, so it’s even more important to get it right. There’s a bunch of tricks that have been learned in other places (e.g. Linux kernel) in getting these things right. We need to get them right in MySQL too.
    • One of these tricks is in ensuring that the compiler doesn’t compile down spinloops to nothing.
  • Bug 72806: mutex_delay() missing x86 pause instruction optimization
    • This is actually a regression over 5.6.
    • On x86, there is an instruction (PAUSE) that tells the CPU that you’re in a spin loop and that it should yield resources in the CPU core to other threads (or thread, as HT CPUs only have 2 threads per core).
    • We have a different way of doing things on POWER, and I’ve got a patch for that too.
    • What’s interesting is reading the Intel CPU manual about the PAUSE instruction and how even if you went and benchmarked it, it depends on the CPU on if this is a NO-OP or not.
    • I suspect that with this bug fixed, performance on Hyper Threaded Intel systems will improve.
  • Bug 72807: Set thread priority in my_pthread_fastmutex_lock
    • This is the POWER equivalent of the x86 PAUSE instruction.
    • I’ve found this patch to have a quite decent positive impact on sysbench point select performance.

There were also the bugs I mentioned in my MySQL 5.6 on POWER blog post. Notably, I had to port Yasufumi’s memory barrier patch from 5.6 to 5.7. My port is incomplete (I can still crash mysqld without too much trying) but I’ve deemed it currently “good enough for benchmarking” and it’s attached to bug 47213 (I hope to spend some time fixing it up soon too). I don’t think I’m missing anything that’s going to have a major performance impact – so while not suitable for production use, it’s good enough to poke some benchmarks at.

So… I’m close to the point where I’ll share my patch for MySQL 5.7, but I’m really wanting to solve the last couple of issues before doing so. The majority of patches are attached to bug reports and get 99% of the way.

Amazingly enough, MySQL 5.7 works fairly well on POWER “out of the box”, and with sysbench point selects, I could quite easily get 320kQPS on a 24 core POWER8 with SMT8 mode without changing a single line of code or doing anything special. This alone is an impressive result when compared to the previous record on both POWER and other CPU architectures with MySQL 5.6 that had been optimized for POWER (while out-of-the-box MySQL 5.7 has not)
For my benchmarks, I’m doing the same procedure, workload and basic my.cnf settings that Dimitri has used and written about, so I won’t repeat that here.With my preliminary patch for MySQL 5.7.4-m14 to have it work well on POWER, on the same system I was using for my MySQL 5.6 benchmarks, I could easily match and indeed exceed the previous published maximum sysbench point select results (I got ~630kQPS). Consider this number a bit preliminary as my patch isn’t completely solid, but it does mean that we’re in the right ballpark for MySQL 5.7 performance, which is great news!So, you might just say “Mission Accomplished” and be done with it. Well… there was one issue:  with the maximum numbers I was getting there was still 30-40% idle CPU on the POWER 8 machine.Now… you could just use that idle 30-40% of total CPU to do other things (solving Sudoku in SQL for example) but that’s no fun.

MySQL 5.6 on POWER (patch available)

The following sentence is brought to you by IBM Legal. The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions.

Okay, now that is out of the way….

If you’re the kind of person who follows the MySQL bugs database closely or subscribes to the MySQL Internals mailing list, you may have worked out that I’ve spent a small amount of time poking at MySQL on modern POWER systems.

Unlike Intel CPUs, POWER CPUs require explicit memory barriers to synchronize memory state between different CPUs. This means that when you’re implementing synchronization primitives, you have one extra thing to get right.

Luckily, if you use straight pthread mutexes, this is already taken care of. Unluckily, there are some optimizations in MySQL that don’t use straight pthread mutexes and so may be problematic on non-Intel CPUs. A few of these issues have sneaked into MySQL over the past few years. The most problematic area was around the optimized mutexes in InnoDB (you can use the pthread_mutex fallback code, but it’s less performant).

Luckily, I both knew where to look and there are good asserts throughout InnoDB code to help spot any other areas that I may not have initially thought of to look at. Coding defensively with a good amount of asserts is a good thing.

After not too much work, I have a set of patches that I’m fairly confident is correct and performs near as well as possible. Initially, I had a different patch that used heavyweight memory barriers in a lot of places, but big kudos to Yasufumi for posting a better patch than mine to bug 47213 – using the lighter weight barriers gives a decent performance boost.

One of the key patches is in the InnoDB mutex code to change the thread priority – i.e. a POWER equivalent to the x86 pause instruction. These are hints to the CPU that the thread being executed is in a spinloop and CPU resources should be allocated to other threads to make betterr forward progress.

After dragging Anton in to have a look and a think, this code may have motivated him to have a go at getting kernel support for adaptive mutexes, thus removing the need for this spin/sleep/yield/eep loop in InnoDB (at least on Linux).

So… I’ve spent the appropriate time filing bugs in the MySQL bug tracker for the things I’ve found. Feel free to track them yourself, they are:

  • Bug 72715: character set code endianness dependent on CPU type rather than endianness of CP
    • I don’t think this is an issue for us… or it could be that this is actually just incredibly untested code in the MySQL Server. It’s also not POWER specific, although was caught by the Migration Assistant which is part of the Advanced Toolchain from IBM.
  • Bug 72718: CACHE_LINE_SIZE in innodb should be 128 on POWER
    • I contributed a patch that’s a simple #ifdef for CPU type. Those who care about other CPU architectures should chime in with the correct value for them.
    • There’s other places in InnoDB where there’s some padding that don’t use this define, I need to file a bug for that.
  • Bug 72754: Set thread priority in InnoDB mutex spinloop
    • This makes a big difference when you have mutex contention and SMT (Symmetric Multi-Threading) enabled (on POWER, you can dynamically change SMT levels at runtime).
    • I’ve contributed a preliminary patch that isn’t generic. I should go and fix that.
  • Bug 72755: InnoDB mutex spin loop is missing GCC barrier
    • This also applies to x86 (and indeed all platforms). If GCC gets a bit smarter, the current code could compile down to nothing, which is exactly what you don’t want from a spinloop. The correct thing to do is to have a GCC memory barrier (not CPU one) to ensure that the compiler doesn’t optimize away the spinning.
    • I’ve contributed a patch, may need #ifdef GCC added.
  • Bug 72809: InnoDB Linux native aio setup missing barrier after setup
    • This appears to be a “POWER8 is fast” related bug :)
    • Patch contributed.
  • Bug 72811: Set NUMA mempolicy for optimum mysqld performance
    • Not POWER specific.
    • I’ve contributed a patch that sets NUMA memory allocation policy inside mysqld rather than having to run “numactl” manually
  • Bug 47213: InnoDB mutex/rw_lock should be conscious about memory ordering other than Intel
    • Originally filed by Yasufumi back in 2009.
    • Some good discussion going on here to ensure the patch is correct. This is the kind of patch that requires more review  than it takes to write it.
    • This patch would fix the majority of problems for non-Intel CPU architectures.
    • Thanks to Yasufumi for providing an updated patch, it helped a lot!
  • Bug 72544: Incorrect locking for global_query_id
    • I found a bug. Rather benign and not POWER specific.

Want to run MySQL 5.6.17 on POWER? Get my MySQL 5.6.17 patch here: https://flamingspork.com/mysql/mysql-5.6.17-POWER.patch

My accumulation of 5.6 patches seems fairly reliable. I’d test before putting into production, and I’d certainly love to know any problems you hit.

Get the quilt series of patches here: https://flamingspork.com/mysql/mysql-5.6.17-POWER-patches.tar.gz

I have, of course, done the legal wrangling for the Oracle Contributor Agreement (remarkably painless) and am working on making the patches completely acceptable to be merged into MySQL.

Awesome MySQL 5.7 improvements

Recently, I’ve had reason to poke at MySQL performance on some pretty cool hardware. Comparing MySQL 5.6 to MySQL 5.7 is a pretty interesting thing to do when you have many CPU cores.

The improvements to creating read views in InnoDB is absolutely huge for small statements with large concurrency – MySQL 5.7 completely removes this as a bottleneck – as much as doubling maximum SQL queries per second, which is a pretty impressive improvement.

I haven’t poked at the similar improvements in Percona Server on this hardware setup – so I can only really guess as to the performance characteristics of it… If comparing to older MySQL versions, Percona Server 5.5 is likely to outperform MySQL 5.5 thanks to this optimization.

But I have to say… MySQL 5.7 is impressive in its concurrency improvements.

Efficiently writing to a log file from multiple threads

There’s a pattern I keep seeing in threaded programs (or indeed multiple processes) writing to a common log file. This is more of an antipattern than a pattern, and is often found in code that has existed for years.

Basically, it’s having a mutex to control concurrent writing to the log file. This is something you completely do not need.

The write system call takes care of it all for you. All you have to do is construct a buffer with your log entry in it (in C, malloc a char[] or have one per thread, in C++ std::string may do), open the log file with O_APPEND and then make a single write() syscall with the log entry.

This works for just about all situations you care about. If doing multi megabyte writes (a single log entry with multiple megabytes? ouch) then you may get into trouble on some systems and get partial writes (IIRC it may have been MacOS X and 8MB) and O_APPEND isn’t exactly awesome on NFS.

But, if what you’re wanting to do is implement something like a general query log, a slow query log or something like that, then you probably want to use this trick rather than, say, taking a pthread_mutex lock while you do malloc(), snprintf() and write(2).

When refactoring parts of Drizzle, we found this done the wrong way in a whole bunch of places in the MySQL server, largely explaining why things like the slow query log and general query log were such a huge drain on database server performance.

It’d be neat to see someone fix that.

Caring about stack usage

It may not be surprising that there’s been a few projects over the years that I’ve worked on where we’ve had to care about stack usage (to varying degrees).

For threaded userspace applications (e.g. MySQL, Drizzle) you get a certain amount of stack per thread – and you really don’t want to bust that. For a great many years now, there’s been both a configuration parameter in MySQL to set how much stack each thread (connection) gets as well as various checks in the source code to ensure there’s enough free stack to do a particular operation (IIRC open_table is the most hairy one of this in MySQL).

For the Linux Kernel, stack usage is a relatively (in)famous problem… although by now just about every real problem has been fixed and merely mentioning it is probably just the influence of the odd grey beard hairs I’m pretending not to notice.

In a current project I’m working on, it’s also something we have to care about.

It turns out that GCC has a few nice things to help you prevent unbounded stack usage or runaway stack usage. There’s two warnings you can enable.

There’s -Wstack-usage=len which will throw warnings on unbounded stack usage (e.g. array on stack sized based on an argument to the function), where stack usage is greater than len and when stack usage may exceed len.

There’s also -Wframe-larger-than=len which is based on calculation for a particular stack frame, as opposed to -Wstack-usage=len, which could be based on several stack frames.

Odds are, you may get some warnings in your project if you set this to what you would consider “conservative” values. Now, if this is every going to explode at runtime is something that’s left as an exercise for the reader, but enabling these warnings is pretty easy and a simple way to help find and prevent some issues.

After all, having your software explode for running off the end of the stack is just a tad embarrassing.

More interesting things in the POWER8/OpenPOWER world

There’s been a couple of interesting things published about things I’ve been working on/with recently.

The OpenPOWER Foundation did a press release on initial white box server design for POWER8. With the hardware, there’s a software stack: firmware and operating system stack developed by IBM, Google and Canonical. Basically, there’s going to be 96 threads per socket with 230GB/sec of memory bandwidth in your cloud, yo.

Obviously, there’s a bunch of stuff that can be talked about more in the future, post release of all these things. But it’s pretty cool to be able to see some information heading out the door in the lead up to POWER8 systems being available.

and now for something completely different…

As many of you know, I’ve been working in the MySQL world for quite a while now. IN fact, it was nearly 10 years ago when I first started hacking on MySQL Cluster at MySQL AB.

Most recently, I was at Percona which was a wonderful journey where over my nearly three years there the company at least doubled in size, launched several new software products and greatly improved the quality and frequency of releases.

However the time has come for something completely different. The MySQL world is rather mature, the future of Percona software is bright and, well, I could do with poking into something rather different.

So a couple of weeks ago I started at IBM in the Linux Technology Centre working on KVM on POWER and related things. No doubt there’ll be interesting things to blog about as time goes on, but it’s about time I posted my change of employment :)

Converting MySQL trees to git

I have put up a set of scripts on github: https://github.com/stewartsmith/bzr-to-git-conversion-scripts. Why do I need these? Well… if only bzr fast-export|git fast-import worked flawlessly for large, complex and old trees. It doesn’t.

Basically, when you clone this repo you can run “./sync-BLAH.sh” and it’ll pull BZR trees for the project, convert to git and clean things up a bit. You will likely have to edit the sync-BLAH.sh scripts as I have them pointed at branches on my own machine (to speed up the process, not having to do fresh BZR branches of MySQL trees over the network is a feature – it’s never been fast.). You’ll also want to edit the git remotes to point where you want git trees to end up.

I’ve done it for:

What problems did I hit? Well… the first is performance, things are slow unless you tweak a bunch of knobs, and then it’s just rather slow rather than slow. So in the empty git repo I set core.compression=1, which makes zlib a whole lot faster.

I naturally give the correct incantation to bzr fast-export to munge tag names appropriately, set a git branch name (each BZR branch ends up as a git branch) and use a marks file (this speeds up incremental syncs).

For one of these branches I was importing, BZR had allowed the invalid committer of “billy-earney billy.earney@gmail.com\n <>” – yes, a newline in the committer. This messes up the fast-import format so I have to run the entire fast-export output through sed to clean it up.

We then use bzr fast-import-filter to apply a user map – which is me looking at the appropriate committers and cleaning them up so that we get better attribution in the resulting git trees as well as cleaning up some errors in the bzr tree so that Git likes them (most notably, missing < or (not and) > around email addresses). The user map is fairly Percona specific, but there’s at least one or two for Oracle committers too.

Next, I pass the output through pv(1) – to do two things: monitor the output to see that it’s still going, and to have a transfer buffer so that git fast-import doesn’t stall waiting for output – amazingly enough, this gave a decent speed boost to import speed.

Finally, when we’re done doing the import of all of the revisions for all of the bzr branches, if this is our first run, we set the HEAD ref to the last BZR branch name and then do a git repack. Through experimentation, I’ve found that “git repack -AdfF –depth=100 –window=500” is what gives me the smallest size possible.

CFP:Developer, Testing, Release and CI Automation miniconf @ linux.conf.au 2014

I have just opened the Call For Papers for the Developer, Testing, Release and Continuous integration Automation miniconf at linux.conf.au 2014.

This miniconf is all about improving the way we produce, collaborate, test and release software.

We want to cover tools and techniques to improve the way we work together to produce higher quality software:

– code review tools and techniques (e.g. gerrit)
– continuous integration tools (e.g. jenkins)
– CI techniques (e.g. gated trunk, zuul)
– testing tools and techniques (e.g. subunit, fuzz testing tools)
– release tools and techniques: daily builds, interacting with distributions, ensuring you test the software that you ship.

All sessions are 30 minutes unless there is prior arrangement. Typically there is a VGA plug at the front of the room but if you have any specialized A/V requirements please enter them as notes at the end and we’ll see what we can do.

Submissions are open until November 20th, with notifications going out over the following 1-2 weeks.

Submit now.

Carbon footprint of interpreted languages

Thought from a good discussion with at François at OSDC today, what is the carbon footprint of various languages? He mentioned that the carbon footprint of a new Haskell compiler release is remarkably non-trivial due to every Haskell package in Debian needing to be rebuilt.

So, I thought, what’s the impact of something like Python? (or Perl). Every machine running the code has to do the bytecode compilation/JIT/interpretation of that code so when, say, Ubuntu ships some new version of $random_desktop_thing_written_in_python, we’re actually compiling it well over 20 million times. That’s a remarkably non-trivial amount of CPU time (and thus CO2 emissions).

So, program in compiled languages such as C or C++ as doing so will save polar bears.

The road to Percona Server 5.6

Over a year ago now, I announced the first Percona Server 5.6 alpha on the Percona MySQL Performance Blog (Announcing Percona Server 5.6 Alpha). That was way back on August 14th, 2012 and it was based on MySQL 5.6.5 released in April.

I’m really happy now to point to the release of the first GA release of Percona Server 5.6 along with some really interesting benchmarks. We’ve certainly come a long way from that first alpha and I’m really happy that we’ve also managed to continue to release Percona Server 5.5 and Percona Server 5.1 releases on time and of high quality.

Over the same time frame that we’ve been working on Percona Server 5.6 we’ve increased the size of the company, improved development practices and grown enough that we’ve reorganised how development of software is managed to make it scale better. One thing I’m really, really pleased about is a culture of quality we’ve managed to nurture.

Keeping a culture of quality alive is something that requires constant nurturing. All too often I’ve seen pressure to ship sooner rather than stabler (yes, I just invented that word), and yes, we initially planned the GA of PS 5.6 earlier than we ended up shipping it, but we instead took the time to round out features and stability to ship something much better.

Now comes the effort of continuing good releases, promoting it and writing a Webinar to give next week.

A better set of Boost m4 macros

I just replaced the old Pandora boost m4 macros in a project with boost.m4 from https://github.com/tsuna/boost.m4 and it basically just solved all my problems with Boost and the whole set of distributions that I build for (everything from CentOS/RHEL 5 to Debian unstable).

I like things that other people maintain.

The end of Bazaar

I’ve used the Bazaar (bzr) version control system since roughly 2005. The focus on usability was fantastic and the team at Canonical managed to get the entire MySQL BitKeeper history into Bazaar – facilitating the switch from BitKeeper to Bazaar.

There were some things that weren’t so great. Early on when we were looking at Bazaar for MySQL it was certainly not the fastest thing when it came to dealing with a repository as large as MySQL. Doing an initial branch over the internet was painful and a much worse experience than BitKeeper was. The work-around that we all ended up using was downloading a tarball of a recent Bazaar repository and then “bzr pull” to get the latest. This was much quicker than letting bzr just do it. Performance for initial branch improved a lot since then, but even today it’s still not great – but at least it isn’t terrible like it once was.
The integration with Launchpad was brilliant. We never really used it for MySQL but for Drizzle the combination was crucial and helped us get releases out the door, track tasks and bugs and do code review. Parts of launchpad saw great development (stability and performance improved immensely) and others did not (has anything at all changed in blueprints in the past 5+ years?). Not running your own bugs db was always a win and I’m really sad to say that I still think Launchpad is the best bug tracker out there.
For both Drizzle and Percona, Bazaar was the right option as it was what MySQL was using, so people in the community already knew the tools. These days however… Git is the tool that there’s large familiarity with – even to the extent that Twitter maintains their MySQL branch in Git rather than in bzr.Is Bazaar really no longer being developed? Here are graphs (from github actually) on the activity on Bazaar itself over the years:Screenshot from 2013-10-02 10:32:19Screenshot from 2013-10-02 10:33:41You can easily see the drop off in commits and code changes. The last commit to trunk was 2 months ago and although there was the 2.6.0 release in August, in my opinion it wasn’t a very strong one (the first one I’ve had problems with in years).So… git is the obvious successor and with such a strong community around GitHub, it kinda makes sense. I’m not saying that GitHub has caught up to Launchpad in terms of features or anything – it’s just that with Bazaar clearly no longer really being developed…. it may be the only option.In fact, in my experiment of putting a mirror of Percona Server on GitHub, we already have a pull request mere days after I blogged about it. Migrating all of Percona development over to Git and Github may take some time, but it’s certainly time that we kicked the tyres on it and worked out how we’d do it without interrupting releases or development.I’ve also thrown up a Drizzle tree and although it required some munging to get the conversion to happen, I’m kind of optimistic about it and I think that after a round of merging things, I’m tempted to very strongly advocate for us switching (which I don’t think there’ll be any opposition to).When will Oracle move over their MySQL development? This I cannot say (as I don’t know and don’t make that call for them). There is a lot of renewed interest in code contribution by Oracle and moving to Git and GitHub may well be a very good way to encourage people.
The downside of git? Well… With BZR you could get away with not understanding pretty much every single bit of the internals. With git, I wish I was so lucky.

An Experimental GIT mirror of Drizzle

I’ve been mirroring a bunch of projects that have their source control in BZR up onto github recently. This turns out to be a bit harder than it sounds for a bunch of reasons that aren’t particularly interesting (although having a commit in the bzr repo where the name of the committer has a newline in it is among the more interesting).

Run on over to https://github.com/stewartsmith/drizzle to check it out. I’ve put up Drizzle 7.0, 7.1 and 7.2 branches.

Who is working on MySQL 5.7?

First I find out the first commit that is in 5.7 that isn’t in 5.6 (using bzr missing) and then look at the authors of all of those commits. Measuring the number of commits is a really poor metric as it does not measure the complexity of the code committed, and if your workflow is to revise a patchset before committing, you get much fewer commits than if you commit 10 times a day.

There are a good number of people who are committing a lot of code to the latest MySQL development tree. (Sorry for the annoying layout of “count. number-of-commits name”)

  1. 1022 Magnus Blaudd
  2. 723 Jonas Oreland
  3. 329 Marko Mäkelä
  4. 286 Krunal Bauskar
  5. 230 Tor Didriksen
  6. 218 John David Duncan
  7. 205 Vasil Dimov
  8. 197 Sunny Bains
  9. 166 Ole John Aske
  10. 141 Marc Alff
  11. 141 Frazer Clement
  12. 140 Jimmy Yang
  13. 131 Joerg Bruehe
  14. 129 Jon Olav Hauglid
  15. 125 Annamalai Gurusami
  16. 106 Martin Skold
  17. 104 Nuno Carvalho
  18. 103 Georgi Kodinov
  19. 102 Pekka Nousiainen

There’s also a good number who have 50-100 commits:

  1. 99 Mauritz Sundell
  2. 97 Bjorn Munch
  3. 92 Craig L Russell
  4. 85 Andrei Elkin
  5. 81 Mattias Jonsson
  6. 73 Nirbhay Choubey
  7. 71 Roy Lyseng
  8. 68 Kevin Lewis
  9. 66 Rohit Kalhans
  10. 65 Guilhem Bichot
  11. 61 Sayantan Dutta
  12. 59 Akhila Maddukuri
  13. 58 Jorgen Loland
  14. 57 Martin Zaun
  15. 56 Harin Vadodaria
  16. 55 Inaam Rana
  17. 53 Venkatesh Duggirala
  18. 53 Venkata Sidagam
  19. 52 Gleb Shchepa
  20. 51 Norvald H. Ryeng
  21. 51 Jan Wedvik
  22. 50 Tatjana Azundris Nuernberg

And there’s even more with less than 50:

  1. 49 Manish Kumar
  2. 49 Alexander Barkov
  3. 48 Shivji Kumar Jha
  4. 48 Martin Hansson
  5. 42 Maitrayi Sabaratnam
  6. 40 Satya Bodapati
  7. 39 Horst Hunger
  8. 38 Neeraj Bisht
  9. 34 Yasufumi Kinoshita
  10. 34 prabakaran thirumalai
  11. 34 Kristofer Pettersson
  12. 33 Evgeny Potemkin
  13. 33 Dmitry Lenev
  14. 33 Chaithra Gopalareddy
  15. 33 Alexander Nozdrin
  16. 31 Hemant Kumar
  17. 31 Allen lai
  18. 31 Aditya A
  19. 30 Nisha Gopalakrishnan
  20. 30 Anirudh Mangipudi
  21. 29 Tanjot Uppal
  22. 28 Christopher Powers
  23. 27 Sujatha Sivakumar
  24. 27 Ashish Agarwal
  25. 25 Olav Sandstaa
  26. 25 Mayank Prasad
  27. 24 Anitha Gopi
  28. 24 Ahmad Abdullateef
  29. 23 Hery Ramilison
  30. 22 Vamsikrishna Bhagi
  31. 22 Praveenkumar Hulakund
  32. 22 Pedro Gomes
  33. 20 Sergey Glukhov
  34. 20 Libing Song
  35. 19 Vinay Fisrekar
  36. 19 Harin Vadodaria
  37. 18 Raghav Kapoor
  38. 18 Luis Soares
  39. 18 Gopal Shankar
  40. 18 Astha Pareek
  41. 17 viswanatham gudipati
  42. 17 Thayumanavar
  43. 17 Ramil Kalimullin
  44. 16 Oystein Grovlen
  45. 15 Dmitry Shulga
  46. 15 Amit Bhattacharya
  47. 15 Akhil Mohan
  48. 14 Ravinder Thakur
  49. 14 Kent Boortz
  50. 13 Bernd Ocklin
  51. 12 Bill Qu
  52. 11 Shaohua Wang
  53. 10 Sven Sandberg

There’s also a good number with fewer than 10 (31 names actually), which is encouraging as it means that this means it’s likely people who are not involved every day in development of new code (maybe QA, build etc) which probably means that (at least internally) contributing code isn’t really a big problem (and as I’ve shown previously, the barriers to external contributions between Oracle MySQL and MariaDB appear to result in roughly the same amount of code from people outside those companies).

There are 125 names here in total, with 19 having over 100 commits, 22 with 50-100 commits, another 53 with 10-50 commits and 31 with <10. So it’s possible to say that there are at least 125 people at Oracle working on MySQL – and I know there are awesome people who are missing from this list as their work doesn’t result in committing code directly to the tree.

Who is working on MariaDB 10.0?

There was some suggestion after my previous post (Who works on MariaDB and MySQL?) that I look at MariaDB 10.0 – so I have. My working was very simple, in a current MariaDB 10.0 BZR tree (somewhat beyond 10.0.3), I ran the following command:

bzr log -n0 -rtag:mariadb-10.0.0..|egrep '(author|committer): '| \
  sed -e 's/^\s*//; s/committer: //; s/author: //'| \
  sort -u|grep -iv oracle

 

MariaDB foundation/MontyProgram/SkySQL:

  1. Alexander Barkov
  2. Alexey Botchkov
  3. Daniel Bartholomew
  4. Elena Stepanova
  5. Igor Babaev
  6. Jani Tolonen
  7. knielsen
  8. Michael Widenius
  9. sanja
  10. Sergei Golubchik
  11. Sergey Petrunya
  12. Sergey Vojtovich
  13. timour
  14. Vladislav Vaintroub

Elsewhere:

  1. Kentoku SHIBA (4 commits)
  2. Lixun Peng (1 commit)
  3. Olivier Bertrand (212 commits)

From Oracle (i.e. revisions merged from Oracle MySQL):

  • 81 names (which I won’t list here as 81 is a lot)

The results are no different if you go back to the first revision that is different between MariaDB 5.5 and 10.0 (found using bzr missing). Even when grepping through the bzr log for things such as “patch by”, “contribution” or “originally” I can only find 1 or two more names as original authors for patches (about the same as I can for patches going into the Oracle tree).

Please point me to revisions (revid is best way) that come from outside contributors as then I really can update this to show that there’s a larger developer community.

The current development version of Drizzle (7.2) has just as many contributors as the MariaDB development version (10.0) – although Drizzle does have fewer commits.

nanomysql – tiny MySQL client lib

I recently got pointed towards https://github.com/shodanium/nanomysql/ which is a tiny (less than 400 lines of C++) MySQL client library which is GPL licensed.

If you need to link into non-GPL compatible code, there is the (slightly larger and full featured) libdrizzle library. But if you want something *tiny* and are okay with GPL, then nanomysql may be something to look at.

Who works on MariaDB and MySQL?

Looking at the committers/authors of patches in the bzr tree for MariaDB 5.5.31.

Non Oracle Contributors:

  1. Alexander Barkov
  2. Alexey Botchkov
  3. Elena Stepanova
  4. Igor Babaev
  5. knielsen
  6. Michael Widenius
  7. sanja
  8. Sergei Golubchik
  9. Sergey Petrunya
  10. timour
  11. Vladislav Vaintroub

Oracle (as they pull Oracle changes):

  1. Aditya A
  2. Akhila Maddukuri
  3. Alexander Nozdrin
  4. Anirudh Mangipudi
  5. Annamalai Gurusami
  6. Astha Pareek
  7. Balasubramanian Kandasamy
  8. Chaithra Gopalareddy
  9. Daniel Fischer
  10. Gleb Shchepa
  11. Harin Vadodaria
  12. Hery Ramilison
  13. Igor Solodovnikov
  14. Inaam Rana
  15. Jon Olav Hauglid
  16. kevin.lewis
  17. Krunal Bauskar
  18. Marc Alff
  19. Marko Mäkelä
  20. Mattias Jonsson
  21. Murthy Narkedimilli
  22. Neeraj Bisht
  23. Nisha Gopalakrishnan
  24. Nuno Carvalho
  25. Olav Sandstaa
  26. Pedro Gomes
  27. prabakaran thirumalai
  28. Praveenkumar Hulakund
  29. Ravinder Thakur
  30. Satya Bodapati
  31. sayantan.dutta
  32. Shivji Kumar Jha
  33. Sujatha Sivakumar
  34. Sunanda Menon
  35. Sunny Bains
  36. Thayumanavar
  37. Tor Didriksen
  38. Venkata Sidagam
  39. Venkatesh Duggirala
  40. Yasufumi Kinoshita

Observations:

  1. All the non-Oracle contributors work for SkySQL (and worked for Monty Program before that)
  2. Even when you go back to MariaDB 5.5.23 I can only find evidence for a maximum of 2-3 external contributions of code to MariaDB since then.
  3. In the same time frame (5.5.23-5.5.32) I see 1 or 2 going into Oracle trees, so it’s roughly the same.
  4. If you look at the contributors from Oracle over 5.5.23 to 5.5.32 there are closer to twice as many as the 40 listed above.

Somebody please correct me if I’m wrong here… perhaps MariaDB guys are just really bad at clearly marking commits that come from elsewhere? I’ve looked for “patch.*by”, “original” and “ontributed” and only turned up the above.