MySQL 5.6 Performance on POWER8

The following sentence is brought to you by IBM Legal: The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions.

My previous post covered the work needed to get MySQL 5.6.17 running reliably on modern POWER systems. The patch to MySQL 5.6.17 that’s needed is available here.

For those who don’t know, POWER8 is the latest Power Architecture processors from IBM (my employer). These chips will be available in systems from IBM in June 2014 (i.e. Real Soon Now(TM)). There’s some fairly impressive specs and numbers (see Wikipedia and elsewhere) – but what could this mean for actual applications?

Well, it turns out that MySQL is a pretty big thing in some target markets for POWER8, and inspired by Dimitri’s impressive benchmark numbers, I thought we should have a go on POWER8.

Firstly, I focused on MySQL 5.6 as it is the current stable release. MySQL 5.7 will be the subject of a future blog post.

The first step was to ensure that MySQL 5.6 worked correctly on POWER. My previous blog post covered the few bugs I ran into and filed (often  with patches). This wasn’t too hard and I’m fairly confident the bug fixes are simple enough to get into MySQL 5.6 – I can’t comment on what would be/could be “officially supported”, that’s a business discussion :)

In order to ensure that my patch was not only correct but performing well, I needed a benchmark. For my initial benchmark. I chose sysbench point selects (i.e. read only key lookups), which should show the theoretical maximum queries per second you could pump through the MySQL Server as well as really stressing the mutex code, helping ensure it was not only correct, but performing well.

A simple comparison of my early patch that used heavyweight memory barriers versus Yasufumi’s patch that used more lightweight ones showed that using heavyweight barriers could be as much as a 50% performance hit – so getting this code right is important.

To add to the fun, the POWER8 processor has a few parameters you can tweak. There is the SMT mode, which dictates how many threads per core there are. This can be changed at runtime. You can be in SMT=off, SMT=2, SMT=4 or SMT=8. Typically, only some workloads can benefit from SMT8 rather than SMT4. There is also DSCR, which is data prefetching. For sysbench point selects, I’ve found we do slightly better (around 10%) when DSCR is set to 1 rather than zero – but YMMV on other benchmarks.

In my experiments, I’ve found that SMT4 or SMT8 seems to be the best bang for buck for MySQL workloads on POWER8. With SMT=2 rather than off, I’ve seen a ~50% performance boost in sysbench point select results. With SMT=4 I’ve seen another 50% boost (i.e. roughly double SMT=off performance). The benefit of SMT8 for MySQL 5.6 (and the 5.6 part is crucial here) may be minimal, especially for this benchmark. This is mostly due to hitting heavy mutex contention inside the MySQL server rather than anything else.

POWER8 systems come in either single or dual socket, with the number of cores being a total of 4, 6, 8, 10, 12, 16, 20 or 24 depending on configuration of the system (go check IBM web site for specifics of what’s available in what model). This means with SMT8, a dual socket, 24 core POWER8 system has 192 hardware threads – the system I was using for these benchmarks.With this number of cores and hardware threads, those familiar with MySQL on multi core systems may already have an inkling that using the full capacity of such a system may be hard for MySQL.

Certainly for old versions of MySQL (such as 5.0 or 5.1) you’re going to get nowhere near full system utilization on POWER8. For MySQL 5.6 (and in the future, 5.7) you have a much better hope.

Before anyone asks, yes, I used jemalloc for most of my benchmarks and it helps by giving a single digit percent performance increase (around 3-4%).

The bottlenecks inside MySQL 5.6 for sysbench point select workload are fairly well documented, so at best we may be striving to equal the performance of other CPU architectures rather than get too much higher simply due to hitting mutex contention in creating read views inside InnoDB. So the maximum performance will be a function of individual core CPU speed and the speed at which a lock can be acquired (i.e. related to how quick you can bounce a cacheline with a lock between cores).

This is exactly what I found on POWER8 with MySQL 5.6 – you hit the same bottleneck on POWER8 as you do everywhere else – creating read views in InnoDB.

That being said, my maximum sysbench point select results on POWER8 was 344kQPS. This not only matches but exceeds the previous record holder by quite a decent amount.

This number was across 8 tables with mysqld bound to a single NUMA node (6 cores) and sysbench bound to another NUMA node (6 cores) on the same socket. For this benchmark, due to the mutex contention, bringing the second socket into play didn’t improve performance. For other benchmarks, (e.g. standard sysbench read only) it seems to scale with more CPU cores much better (no doubt the subject of a future blog post).

Single table sysbench point select was also impressive at 335kQPS – you only got an additional 10kQPS by going to 8 tables! All of these results were with SMT4 and DSCR=1, which seems to be the best configuration for this type of workload.

Up next: MySQL 5.7 on POWER8.

MySQL 5.6 on POWER (patch available)

The following sentence is brought to you by IBM Legal. The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions.

Okay, now that is out of the way….

If you’re the kind of person who follows the MySQL bugs database closely or subscribes to the MySQL Internals mailing list, you may have worked out that I’ve spent a small amount of time poking at MySQL on modern POWER systems.

Unlike Intel CPUs, POWER CPUs require explicit memory barriers to synchronize memory state between different CPUs. This means that when you’re implementing synchronization primitives, you have one extra thing to get right.

Luckily, if you use straight pthread mutexes, this is already taken care of. Unluckily, there are some optimizations in MySQL that don’t use straight pthread mutexes and so may be problematic on non-Intel CPUs. A few of these issues have sneaked into MySQL over the past few years. The most problematic area was around the optimized mutexes in InnoDB (you can use the pthread_mutex fallback code, but it’s less performant).

Luckily, I both knew where to look and there are good asserts throughout InnoDB code to help spot any other areas that I may not have initially thought of to look at. Coding defensively with a good amount of asserts is a good thing.

After not too much work, I have a set of patches that I’m fairly confident is correct and performs near as well as possible. Initially, I had a different patch that used heavyweight memory barriers in a lot of places, but big kudos to Yasufumi for posting a better patch than mine to bug 47213 – using the lighter weight barriers gives a decent performance boost.

One of the key patches is in the InnoDB mutex code to change the thread priority – i.e. a POWER equivalent to the x86 pause instruction. These are hints to the CPU that the thread being executed is in a spinloop and CPU resources should be allocated to other threads to make betterr forward progress.

After dragging Anton in to have a look and a think, this code may have motivated him to have a go at getting kernel support for adaptive mutexes, thus removing the need for this spin/sleep/yield/eep loop in InnoDB (at least on Linux).

So… I’ve spent the appropriate time filing bugs in the MySQL bug tracker for the things I’ve found. Feel free to track them yourself, they are:

  • Bug 72715: character set code endianness dependent on CPU type rather than endianness of CP
    • I don’t think this is an issue for us… or it could be that this is actually just incredibly untested code in the MySQL Server. It’s also not POWER specific, although was caught by the Migration Assistant which is part of the Advanced Toolchain from IBM.
  • Bug 72718: CACHE_LINE_SIZE in innodb should be 128 on POWER
    • I contributed a patch that’s a simple #ifdef for CPU type. Those who care about other CPU architectures should chime in with the correct value for them.
    • There’s other places in InnoDB where there’s some padding that don’t use this define, I need to file a bug for that.
  • Bug 72754: Set thread priority in InnoDB mutex spinloop
    • This makes a big difference when you have mutex contention and SMT (Symmetric Multi-Threading) enabled (on POWER, you can dynamically change SMT levels at runtime).
    • I’ve contributed a preliminary patch that isn’t generic. I should go and fix that.
  • Bug 72755: InnoDB mutex spin loop is missing GCC barrier
    • This also applies to x86 (and indeed all platforms). If GCC gets a bit smarter, the current code could compile down to nothing, which is exactly what you don’t want from a spinloop. The correct thing to do is to have a GCC memory barrier (not CPU one) to ensure that the compiler doesn’t optimize away the spinning.
    • I’ve contributed a patch, may need #ifdef GCC added.
  • Bug 72809: InnoDB Linux native aio setup missing barrier after setup
    • This appears to be a “POWER8 is fast” related bug :)
    • Patch contributed.
  • Bug 72811: Set NUMA mempolicy for optimum mysqld performance
    • Not POWER specific.
    • I’ve contributed a patch that sets NUMA memory allocation policy inside mysqld rather than having to run “numactl” manually
  • Bug 47213: InnoDB mutex/rw_lock should be conscious about memory ordering other than Intel
    • Originally filed by Yasufumi back in 2009.
    • Some good discussion going on here to ensure the patch is correct. This is the kind of patch that requires more review  than it takes to write it.
    • This patch would fix the majority of problems for non-Intel CPU architectures.
    • Thanks to Yasufumi for providing an updated patch, it helped a lot!
  • Bug 72544: Incorrect locking for global_query_id
    • I found a bug. Rather benign and not POWER specific.

Want to run MySQL 5.6.17 on POWER? Get my MySQL 5.6.17 patch here: https://flamingspork.com/mysql/mysql-5.6.17-POWER.patch

My accumulation of 5.6 patches seems fairly reliable. I’d test before putting into production, and I’d certainly love to know any problems you hit.

Get the quilt series of patches here: https://flamingspork.com/mysql/mysql-5.6.17-POWER-patches.tar.gz

I have, of course, done the legal wrangling for the Oracle Contributor Agreement (remarkably painless) and am working on making the patches completely acceptable to be merged into MySQL.