1 million SQL Queries Per Second: MySQL 5.7 on POWER8

I’ve previously covered MySQL 5.6 on POWER (with patch), MySQL 5.6 Performance on POWER8 (spoiler: new performance record) and MySQL 5.7 on POWER.

Of course, The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions. Also, these numbers should be considered preliminary, but trust me – I did get them and it’s not April 1st.

From my last post, you saw that with my preliminary patch for MySQL 5.7 to work on POWER, we could easily match the previous record for sysbench point select queries per second (i.e. key lookups). In fact, we could exceed the published record by a little bit which is kind of nice. At around 630kQPS, one could be rather happy.

But we still had 30-40% idle CPU on POWER8. This led me to file the following bug report:

  • Bug 72829: LOCK_grant is major contention point, leaves 30-40% idle CPU.

What’s going on is that there’s a rwlock in the MySQL Server that ensures that writers don’t collide with readers to the data structures describing the GRANTs (i.e. who has access to what). If you run a GRANT statement, it gets a writer lock, and nobody can read (i.e. check permissions) while everything is being updated. If you run a normal SQL statement, you get a read lock (non-exclusive) and can check permissions appropriately.

It’s been known for a long time that LOCK_grant was a bottleneck. Typically, some people have run with skip-grant-tables to help shorten the time the lock as held (as in MySQL you still take the mutex even though you’ve started the server with skip-grant-tables).

In Drizzle, we fixed that – moving authentication and authorization completely behind plugin APIs and if you didn’t load plugins for them, you executed near enough to zero instructions that it didn’t matter.

In my experiments, enabling skip-grant-tables actually hurt performance rather than helped. More investigation is needed, but it seems that simply the act of acquiring and releasing the rdlock is now a major bottleneck in some benchmarks (such as sysbench point select).

It turns out that this is a well known problem in other pieces of software (e.g. Linux kernel) and is pretty much what RCU (Read Copy Update) is best at. As far back as 2006 I remember attempting to get my head around RCU so that one day we could use it in MySQL or MySQL Cluster.

Another simpler method is simply splitting the mutex, with readers able to acquire any one of N mutexes and writers needing to acquire them all. This penalizes writers, but unless you’re executing a lot of GRANTs, you’re probably safe.

So… what is the theoretical maximum performance if this bottleneck went away?

I wrote a quick patch that just commented out the rdlock acquisition of LOCK_grant in the hot codepath of sysbench point selects. I wasn’t running GRANT statements at runtime so this was “safe”.

This patch is not production ready, it’s merely useful for demonstrating where we could be with MySQL 5.7 on POWER8 if one last bottleneck is fixed.

My results? Slightly over ONE MILLION QUERIES PER SECOND!

This is roughly twice the previous record.

This is with a dual socket 24 core POWER8 with SMT8 and DSCR=1 on 8 tables with sysbench 0.4.8. Sysbench itself is using a non-trivial amount of CPU and I could probably decently beat this number if I rewrote sysbench using the nonblocking API in libdrizzle (back when me made the Drizzle performance regression tests use a libdrizzle-ified sysbench we got double digit percentage improvement in our sysbench numbers).

There’s still around 7-10% idle CPU time… so there’s more room to grow.

Lacking a physical gauntlet to throw down, I’ll just have to submit a conference paper somewhere so that I can do that in person.

I really hope that we’re able to fix this bottleneck in MySQL 5.7 so that MySQL 5.7 will ship being able to do over a million queries per second. From SQL.