Detecting if a MySQL server supports partitioning

This morning, this Percona XtraBackup bug came to my attention: – basically, it’s now really quite tricky to determine if a MySQL server you’re connected to supports partitioning or not.

If you’re connected to anything less than MySQL 5.6, you can use have_partitioning variable. But since that’s gone in 5.6, you’re going to get a false negative if you’re connected to 5.6. You could use INFORMATION_SCHEMA.PLUGINS table, but that’s not there in 5.0, so you have some added workarounds to add there too.

A simple version check could be the solution… but what if you compiled the server without partitioning support?

Undocumented ALTER TABLE that does *nothing* (useful)

(at least since MySQL 5.1.42)

alter table t1 force;

Pretty neat huh? In fact, in Drizzle this will end up doing a copying alter table. Not useful.

There’s an over four year old bug report in MySQL (Bug#24091).

I’m just going to remove that bit from the parser in Drizzle – it makes no sense.

Storage Engine API state graph

Drizzle still has a number of quirks inherited from the MySQL Storage Engine API (e.g. BLOBs, row buffer, CREATE SELECT and lack of DDL transaction boundaries, key tuple format). One of the things we fixed a long time ago was to have proper methods for StorageEngines to be called for: startTransaction, startStatement, endStatement, commit and rollback.

If you’ve had to implement a transactional storage engine in MySQL you will be well aware of the pattern of “in every Storage Engine/handler call: if transaction doesn’t exist, begin.” We’ve tried to fix this in the Drizzle API for a number of reasons. I think having this obvious set of calls will make the API a lot easier to understand. I am also very interested in making things much easier to prove correct.

A while ago I spotted Bug 587772, which was the READ COMMITTED isolation level not working correctly with InnoDB. It turns out that the most basic example for READ COMMITTED failed. Hrrm… this is no good. It worked on MySQL, so this was certainly something that we broke. What was more worrying is that there wasn’t a test for this in the test suite (and at the time I couldn’t find one in the MySQL test suite either, so I think we inherited the missing test).

I recently started delving in, actually going to solve this. I noticed something worrying, endStatement wasn’t being called, which is where the innobase plugin would release the read view that it used for the statement. You’d think that it would grab a new one on startStatement, but because of the previous design of the API (remember “if txn isn’t started, start it!”) this also happened for getting the read view for the statement… so we instead got a REPEATABLE READ isolation level.

I wanted a test.

Previously, I’ve created a dummy storage engine (tableprototester) and used it to test the server code for reading the table protobuf message. I thought about doing a Storage Engine for this problem too, basically looking at the calls to the Storage Engine as transitions between states in a state machine.

A basic view of a transaction could be:

State transitions for a transaction. Transaction can be empty OR have one or more statementsThat is, a transaction starts and has zero or more statements before it commits or gets rolled back.

By coding up a data structure of allowable state transitions, a small function to assert() on invalid transitions and enough of the boilerplate to make the engine “work”, I was able to hit an assert() exactly where I’d expected it: at an invalid transition from START STATEMENT to COMMIT.

To fix the initial bug (READ COMMITTED not working), I filled in a few state transitions for the system as a whole that aren’t quite correct. From the diagram below, you can quite obviously see where the obvious bugs are (it helps that I’ve coloured them red):

There is absolutely no sense in going BEGIN -> END STATEMENT or immediately to COMMIT. These should be relatively easy to solve too, but are separate bugs.

I wish to expand this in the future to cover Cursor as well. It will also be useful to ensure that DDL can be wrapped in transactions. Not to mention the last few HTON flags that exist (and should likely go away).

To generate the diagrams, I just wrote a little utility to dump out the state transitions in dot, using it to generate the diagrams.

A tale of a bug…

So I sometimes get asked if we funnel back bug reports or patches back to MySQL from Drizzle. Also, MariaDB adds some interest here as they are a lot closer (and indeed compatible with) to MySQL. With Drizzle, we have deviated really quite heavily from the MySQL codebase. There are still some common areas, but they’re getting rarer (especially to just directly apply a patch).

Back in June 2009, while working on Drizzle at Sun, I found a bug that I knew would affect both. The patch would even directly apply (well… close, but I made one anyway).

So the typical process of me filing a MySQL bug these days is:

  • Stewart files bug
  • In the next window of Sveta being awake, it’s verified.

This happened within a really short time.

Unfortunately, what happens next isn’t nearly as awesome.

Namely, nothing. For a year.

So a year later, I filed it in launchpad for MariaDB.

So, MariaDB is gearing up for a release, it’s a relatively low priority bug (but it does have a working, correct and obvious patch), within 2 months, Monty applied it and improved the error checking around it.

So MariaDB bug 588599 is Fix Committed (June 2nd 2010 – July 20th 2010), MySQL Bug 45377 is still Verified (July 20th 2009 – ….).

(and yes, this tends to be a general pattern I find)

But Mark says he gets things through… so yay for him.2

ENUM now works properly (in Drizzle)

Over at the Drizzle blog, the recent 2010-06-07 tarball was announced. This tarball release has my fixes for the ENUM type, so that it now works as it should. I was quite amazed that such a small block of code could have so many bugs! One of the most interesting was the documented limit we inherited from MySQL (see the MySQL Docs on ENUM) of a maximum of 65,535 elements for an ENUM column.

This all started out from a quite innocent comment of Jay‘s in a code review for adding support for the ENUM data type to the embedded_innodb engine. It was all pretty innocent… saying that I should use a constant instead of the magic 0x10000 number as a limit on an assert for sanity of values getting passed to the engine. Seeing as there wasn’t a constant already in the code for that (surprise number 1), I said I’d fix it properly in a separate patch (creating a bug for it so it wouldn’t get lost) and the code went in.

So, now, a few weeks after that, I got around to dealing with that bug (because hey, this was going to be an easy fix that’ll give me a nice sense of accomplishment). A quick look in the Field_enum code raised my suspicions of bugs… I initially wondered if we’d get any error message if a StorageEngine returned a table definition that had too many ENUM elements (for example, 70,000). So, I added a table to the tableprototester plugin (a simple dummy engine that is loaded for testing the parsing of specially constructed table messages) that had 70,000 elements for a single ENUM column. It didn’t throw an error. Darn. It did, however, have an incredibly large result for SHOW CREATE TABLE.

Often with bugs like this I may try to see if the problem is something inherited from MySQL. I’ll often file a bug with MySQL as well if that’s the case. If I can, I’ll sometimes attach the associated patch from Drizzle that fixes the bug, sometimes with a patch directly for and tested on MySQL (if it’s not going to take me too long). If these patches are ever applied is a whole other thing – and sometimes you get things like “each engine is meant to have auto_increment behave differently!” – which doesn’t inspire confidence.

But anyway, the MySQL limit is somewhere between 10850 and 10900. This is not at all what’s documented. I’ve filed the appropriate bug (Bug #54194) with reproducible test case and the bit of problematic code. It turns out that this is (yet another) limit of the FRM file. The limit is “about 64k FRM”. The bit of code in MySQL that was doing the checking for the ENUM limit was this:

/* Hack to avoid bugs with small static rows in MySQL */
  if (info_length+(ulong) create_fields.elements*FCOMP+288+
      n_length+int_length+com_length > 65535L || int_count > 255)

So it’s no surprise to anyone how this specific limit (the number of elements in an ENUM) got missed when I converted Drizzle from using an FRM over to a protobuf based structure.

So a bunch of other cleanup later, a whole lot of extra testing and I can pretty confidently state that the ENUM type in Drizzle does work exactly how you think it would.

Either way, if you’re getting anywhere near 10,000 choices for an ENUM column you have no doubt already lost.

New CREATE TABLE performance record!

4 min 20 sec

So next time somebody complains about NDB taking a long time in CREATE TABLE, you’re welcome to point them to this :)

  • A single CREATE TABLE statement
  • It had ONE column
  • It was an ENUM column.
  • With 70,000 possible values.
  • It was 605kb of SQL.
  • It ran on Drizzle

This was to test if you could create an ENUM column with greater than 216 possible values (you’re not supposed to be able to) – bug 589031 has been filed.

How does it compare to MySQL? Well… there are other problems (Bug 54194 – ENUM limit of 65535 elements isn’t true filed). Since we don’t have any limitations in Drizzle due to the FRM file format, we actually get to execute the CREATE TABLE statement.

Still, why did this take four and a half minutes? I luckily managed to run poor man’s profiler during query execution. I very easily found out that I had this thread constantly running check_duplicates_in_interval(), which does a stupid linear search for duplicates. It turns out, that for 70,000 items, this takes approximately four minutes and 19.5 seconds. Bug 589055 CREATE TABLE with ENUM fields with large elements takes forever (where forever is defined as a bit over four minutes) filed.

So I replaced check_duplicates_in_interval() with a implementation using a hash table (boost::unordered_set actually) as I wasn’t quite immediately in the mood for ripping out all of TYPELIB from the server. I can now run the CREATE TABLE statement in less than half a second.

So now, I can run my test case in much less time and indeed check for correct behaviour rather quickly.

I do have an urge to find out how big I can get a valid table definition file to though…. should be over 32MB…

Fun with the 387

Filed  GCC bug 39228:

#include <stdio.h>
#include <math.h>
int main()
        double a= 10.0;
        double b= 1e+308;
        printf("%d %d %dn", isinf(a*b), __builtin_isinf(a*b), __isinf(a*b));
        return 0;

mtaylor@drizzle-dev:~$ gcc -o test test.c
mtaylor@drizzle-dev:~$ ./test
0 0 1
mtaylor@drizzle-dev:~$ gcc -o test test.c -std=c99
mtaylor@drizzle-dev:~$ ./test
1 0 1
mtaylor@drizzle-dev:~$ gcc -o test test.c   -mfpmath=sse -march=pentium4
mtaylor@drizzle-dev:~$ ./test
1 1 1
mtaylor@drizzle-dev:~$ g++ -o test test.c
mtaylor@drizzle-dev:~$ ./test
1 0 1

Originally I found the simple isinf() case to be different on x86 than x86-64, ppc32 and sparc (32 and 64).

After more research, I found that x86-64 uses the sse instructions to do it (and using sse is the only way for __builtin_isinf() to produce correct results). For the g++ built version, it calls __isinf() instead of inlining (and as can be seen, the __isinf() version is always correct).

Specifically, it’s because the optimised 387 code is doing the math in double extended precision inside the FPU. 10.0*1e308 fits in 80bits but not in 64bit. Any code that forces it to be stored and loaded gets the correct result too. e.g.

mtaylor@drizzle-dev:~$ cat test-simple.c

#include <stdio.h>
#include <math.h>
int main()
        double a= 10.0;
        double b= 1e+308;
    volatile    double c= a*b;
        printf("%dn", isinf(c));
        return 0;

mtaylor@drizzle-dev:~$ gcc -o test-simple test-simple.c
mtaylor@drizzle-dev:~$ ./test-simple

With this code you can easily see the load and store:

 8048407:       dc 0d 18 85 04 08       fmull  0x8048518 804840d:       dd 5d f0                fstpl  -0x10(%ebp)
 8048410:       dd 45 f0                fldl   -0x10(%ebp)
 8048413:       d9 e5                   fxam

While if you remove volatile, the load and store doesn’t happen (at least on -O3, on -O0 it hasn’t been optimised away):

 8048407:       dc 0d 18 85 04 08       fmull  0x8048518
 804840d:       c7 44 24 04 10 85 04    movl   $0x8048510,0x4(%esp)
 8048414:       08
 8048415:       c7 04 24 01 00 00 00    movl   $0x1,(%esp)
 804841c:       d9 e5                   fxam

This is also a regression from 4.2.4 as it just calls isinf() and doesn’t expand the 387 code inline. My guess is the 387 optimisation was added in 4.3.

Recommended fix: store and load in the 387 version so to operate on same precision as elsewhere.

Now I just have to make a patch I like that makes Drizzle behave because of this (showed up as a failure in the SQL func_math test) and then submit to MySQL as well… as this may happen there if “correctly” built.

irritation of the day….

There’s a lot of things about the MySQL bug tracking system i like… but there’s a few things that annoy the heck out of me.

Today it’s the fact that if you put a term in the “with any of the words” field on advanced search that’s an number (e.g. ‘839’ as you’re looking for bugs that talk about error 839) you get taken to bug numebr 839. Funnily enough, this has nothing to do with an NDB problem I’m trying to see the status of. grr…

and now, back to your regular programming…