volatile considered harmful

While playing with MySQL 5.7.5 on POWER8, I came across a rather interesting bug (74775 – and this is not the only one… I think I have a decent amount of auditing and patching to do now) which made me want to write a bit on memory barriers and the volatile keyword.

Memory barriers are hard.

Like, super hard. It’s the kind of thing that makes you curse hardware designers, probably because they’re not magically solving all your problems for you. Basically, as you get more CPU cores and each of them have caches, it gets more expensive to keep everything in sync. It’s quite obvious that with *ahem* an eventually consistent model, you could save a bunch of time and effort at the expense of shifting some complexity into software.

Those in the MySQL world should recognize this – we’ve been dealing with asynchronous replication for well over a decade as a good way to scale.

On some CPU architectures (POWER for example) not all loads are created equal. When you load a value from memory, it will be consistent with your thread of execution. That is, with any stores that you have done in this thread of execution. If another thread updates that memory location you may not see that update even if your load occurs after that thread updates that memory location. Think eventually consistent.

If you want up to date reads (and not clobber writes), then you get to do memory barriers! (a topic for elsewhere – the PowerISA document has good explanations of what we have on POWER though, and how load with reserve works).

What the volatile keyword does is generate load and store instructions. It is useful when talking to hardware, as the load and store instructions are actually doing something there that the compiler doesn’t know about and thus shouldn’t optimize away.

The volatile keyword does not add any memory barriers. This is important to realize – volatile just makes loads and stores happen for your thread, not in relation to any other threads of execution. Thus, you cannot use volatile as a thread synchronization mechanism at all. It is completely and totally wrong.

Basically, if you have a volatile variable and you do stores to it in one thread and loads in another, after the store happens, it could be quite a long time before the thread doing the loads sees it! For some applications this may be okay (although I can’t really think of any beyond very very inaccurate status variables)… but if it matters at all for application correctness, volatile is the wrong thing to use.

Further reading:

The MEMORY storage engine

I recently wrote about Where are they now: MySQL Storage Engines and The MERGE storage engine: not dead, just resting…. or forgotten. Today, it’s the turn of the MEMORY storage engine – otherwise known as HEAP.

This is yet another piece of the MySQL server that sits largely unmaintained and unloved. The MySQL Manual even claims that it supports encryption… with the caveat of having to use the SQL functions for encryption/decryption rather than in the engine itself (so, basically, it supports encryption about as much as every other engine does).

The only “recent” innovation in the MEMORY engine was the dynamic row patch that ended up making its way into Percona Server (and isn’t enabled by default). This forced me to go and look at the code of the MEMORY engine again and I cannot possibly drink enough in my lifetime to erase the memory.

The MEMORY engine is used by just about everybody as you probably have a SQL query somewhere that uses an in memory temporary table. I can, however, feel the comments being added to this post right now by people who use gdb to set server variables that not a single query in their systems use MEMORY….. (IIRC there have been some patches around that would throw an error rather than create a temporary table)

We had a early version of the dynamic row format patch in Drizzle for a while… and if you turned it on, all sorts of things horrifically broke. It was a remarkably non-trivial amount of work to get that code to work properly and this is largely a testament to the “design” of the MEMORY engine.

While it may be efficient or fast or something (likely on 1990s hardware and workloads), it misses the boat completely on the things that matter today: simultaneous access, MVCC, BLOB/TEXT columns and transactions. Basically, it’s a engine that’s really only useful for a single connection in limited use cases…. and even then, it’s likely a good way to ruin things. MyISAM is better as at least on memory pressure things may be written out to disk sensibly…. and if InnoDB had a “don’t log this table” mode it would beat that absolute pants off it.

It is, again, another part of the MySQL server that’s remarkably hard to pull out and replace with something different/better. Why? Well, I wrote about it before: Refactoring Internal temporary tables (another stab at it). If it was easy, we’d likely have Tokyo Cabinet (via BlitzDB) or similar (some bit of code maintained by other people) doing the same job in Drizzle rather than this large chunk of code that nobody really cares about.