MySQL Cluster on POWER8

So, I’ve written previously on MySQL on POWER, and today is a quick bit of news about MySQL Cluster on POWER – specifically MySQL Cluster 7.3.7.

I ran into three main issues in getting some flexAsync benchmark results. One of them was the fact that I wanted to do this in the middle of all the POWER8 machines I usually use moving buildings (hard to run benchmarks when computers are packed up in boxes on a truck).

The next issue was that ndbmtd (the multi-threaded data node) needs memory barriers for the magic message passing stuff between threads. So, that’s pretty easy (about an eight line patch).

The next issue was in the results from flexAsync, it turns out 32bit math is a bad idea with results from my POWER8 box.

My preliminary performance numbers are fairly promising (actually… what is the world record for a single machine and NDB these days? Single data node?). I think there’s a bit more low hanging fruit and a couple more things that are a bit more involved.

Bugs with patches:

  • Bug 74782 – compile fix (memory barriers for POWER)
  • Bug 74781 – flexAsync uses 32bit math, leading to incorrect summary on POWER8

MySQL 5.6 on POWER (patch available)

The following sentence is brought to you by IBM Legal. The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions.

Okay, now that is out of the way….

If you’re the kind of person who follows the MySQL bugs database closely or subscribes to the MySQL Internals mailing list, you may have worked out that I’ve spent a small amount of time poking at MySQL on modern POWER systems.

Unlike Intel CPUs, POWER CPUs require explicit memory barriers to synchronize memory state between different CPUs. This means that when you’re implementing synchronization primitives, you have one extra thing to get right.

Luckily, if you use straight pthread mutexes, this is already taken care of. Unluckily, there are some optimizations in MySQL that don’t use straight pthread mutexes and so may be problematic on non-Intel CPUs. A few of these issues have sneaked into MySQL over the past few years. The most problematic area was around the optimized mutexes in InnoDB (you can use the pthread_mutex fallback code, but it’s less performant).

Luckily, I both knew where to look and there are good asserts throughout InnoDB code to help spot any other areas that I may not have initially thought of to look at. Coding defensively with a good amount of asserts is a good thing.

After not too much work, I have a set of patches that I’m fairly confident is correct and performs near as well as possible. Initially, I had a different patch that used heavyweight memory barriers in a lot of places, but big kudos to Yasufumi for posting a better patch than mine to bug 47213 – using the lighter weight barriers gives a decent performance boost.

One of the key patches is in the InnoDB mutex code to change the thread priority – i.e. a POWER equivalent to the x86 pause instruction. These are hints to the CPU that the thread being executed is in a spinloop and CPU resources should be allocated to other threads to make betterr forward progress.

After dragging Anton in to have a look and a think, this code may have motivated him to have a go at getting kernel support for adaptive mutexes, thus removing the need for this spin/sleep/yield/eep loop in InnoDB (at least on Linux).

So… I’ve spent the appropriate time filing bugs in the MySQL bug tracker for the things I’ve found. Feel free to track them yourself, they are:

  • Bug 72715: character set code endianness dependent on CPU type rather than endianness of CP
    • I don’t think this is an issue for us… or it could be that this is actually just incredibly untested code in the MySQL Server. It’s also not POWER specific, although was caught by the Migration Assistant which is part of the Advanced Toolchain from IBM.
  • Bug 72718: CACHE_LINE_SIZE in innodb should be 128 on POWER
    • I contributed a patch that’s a simple #ifdef for CPU type. Those who care about other CPU architectures should chime in with the correct value for them.
    • There’s other places in InnoDB where there’s some padding that don’t use this define, I need to file a bug for that.
  • Bug 72754: Set thread priority in InnoDB mutex spinloop
    • This makes a big difference when you have mutex contention and SMT (Symmetric Multi-Threading) enabled (on POWER, you can dynamically change SMT levels at runtime).
    • I’ve contributed a preliminary patch that isn’t generic. I should go and fix that.
  • Bug 72755: InnoDB mutex spin loop is missing GCC barrier
    • This also applies to x86 (and indeed all platforms). If GCC gets a bit smarter, the current code could compile down to nothing, which is exactly what you don’t want from a spinloop. The correct thing to do is to have a GCC memory barrier (not CPU one) to ensure that the compiler doesn’t optimize away the spinning.
    • I’ve contributed a patch, may need #ifdef GCC added.
  • Bug 72809: InnoDB Linux native aio setup missing barrier after setup
    • This appears to be a “POWER8 is fast” related bug :)
    • Patch contributed.
  • Bug 72811: Set NUMA mempolicy for optimum mysqld performance
    • Not POWER specific.
    • I’ve contributed a patch that sets NUMA memory allocation policy inside mysqld rather than having to run “numactl” manually
  • Bug 47213: InnoDB mutex/rw_lock should be conscious about memory ordering other than Intel
    • Originally filed by Yasufumi back in 2009.
    • Some good discussion going on here to ensure the patch is correct. This is the kind of patch that requires more review  than it takes to write it.
    • This patch would fix the majority of problems for non-Intel CPU architectures.
    • Thanks to Yasufumi for providing an updated patch, it helped a lot!
  • Bug 72544: Incorrect locking for global_query_id
    • I found a bug. Rather benign and not POWER specific.

Want to run MySQL 5.6.17 on POWER? Get my MySQL 5.6.17 patch here:

My accumulation of 5.6 patches seems fairly reliable. I’d test before putting into production, and I’d certainly love to know any problems you hit.

Get the quilt series of patches here:

I have, of course, done the legal wrangling for the Oracle Contributor Agreement (remarkably painless) and am working on making the patches completely acceptable to be merged into MySQL.