Return of the “Top 5 MySQL Wishlist” and looking at Drizzle

It’s coming up on a year since I started working full time on Drizzle. So, I got a bit reflective…

Have we done things that I (and others) really wanted done? Back in 2007, I wrote my top 5 wishlist for the MySQL Server.

I am not going to pretend I speak for the MySQL development team; I’m just trying to evaluate how Drizzle is doing against some wishlists that (to me) embodied some of the reasons we started Drizzle.

Please think of this as “database server wishlists” and comparing them against Drizzle….

My wishlist was:

5. Six-monthly release cycles

Done. Not only does Drizzle have milestone releases, but we’re also dropping tarballs every two weeks (currently for the bell milestone). We’re also doing a decent job of keeping trunk free of massive breakage.

4. Much more in depth automated testing

Done (and in progress). We have drizzle-automation running things all the time. Hudson (and buildbot) test across many platforms (what pushbuild did and more when inside MySQL) before code hits trunk. We also have regular performance benchmarks that we compare across versions, crash-me, the random query generator as well as checking that we don’t regress in code size (via sloccount).

3. Sane build system

(slightly distorting my original words). Well, we’re not quite at the ready for packaging in distributions stage, but a debug or non debug build is just -g and optimization level for the compiler, plugins are using autofoo to work otu if they can be built… so yeah, this is pretty sane.

We also are building with -Werror (and more!) which increases code quality no end.

So, mostly done.

3.5. (yes, i have a 3.5): Kill HPUX

Done.

2. Increased liberal use of asserts

An in-progress thing, but the better compiler warnings have won us a lot.

1. Pluggable data dictionary

Not only that, but done away with FRM totally. Really happy with this.

What about other peoples wishlist?

Kostya had one:

1. Remove excessive fuss.

i.e. “just do it”.

I think we’re doing really well with this for Drizzle. Plugins are pretty easy to get merged, and if your patches to the kernel are good, they’re also easy. Big changes can be harder, but in the end it has turned out well.

2. Open the development process.

Done. There is no internal wiki, there are no “committers” versus “non-committers”.. everything is judged on merit of the idea/code. Sometimes the most valuable contribution is somebody telling you their real world experience.

3. Get to a normal release schedule.

Done.

4. Establish productive relationship with the majority of users.

I think drizzle-discuss mailing list is doing quite well in this regard. Is quite active with discussion.

5. Find a way to do incompatible changes with minimal pain for users.

We’ll see how we go :)

Ronald also had a list:

1. Real time Query Monitoring

With gearman logging and my recent experimentation with using CPU performance counters I think we’ll end up somewhere rather awesome.

If you’re looking for MySQL monitoring though, the MySQl Enterprise query monitoring stuff looks pretty good to me.

2. Consistent Release Cycles

We’re doing pretty well so far!

3. INFORMATION_SCHEMA Extensions

We’ve inherited the architecture from MySQL 5.1 (and 6.0) of being able to pretty easily add INFORMATION_SCHEMA tables and improved it. It’s currently pretty easy to add them. We also have ongoing work having an INFORMATION_SCHEMA storage engine which means that you won’t have to have the I_S tables be materialized every time you query them.

4. Online table maintenance

All progress has been due to Storage Engine authors. With the data dictionary work though, this gets easier and saner to do.

5. Published benchmarks

We’re encouraging others who will be more objective :) Although we also do regular performance regression tests as part of our standard development process.

Dormando also had a list (complete with “there is no five”):

1) Logical separation of connections from threads

We have this in Drizzle through plugins. Interesting ones are pool_of_threads (fixed number of threads), multithread (thread per connection) and single_thread (one thread).

2) A more modular core

We’re very much doing well here. It’s a long process, but I’m quite impressed by our progress.

3) Better replication (better replication management/protocol?)

The work being done on Drizzle replication is really exciting. I love the fact that modularity is encouraged and the ability to replace any bit you want easily, as well as read the replication stream in about any language you want.

4) Better test suite

I may never be 100% happy with a test suite, but we’re doing good…

PeterZ‘s view is always interesting, and he had one too:

1. Be Pluggable

Check. (and also, of course, in progress)

2. Be Scalable

We’ve done a lot of work scaling on many CPUs and many connections. Really, 8 concurrent database connections just isn’t interesting. We ever run as part of our regression suite up to 2048 concurrent connections.

3. Be Distributed

With the new protocol work to have built in sharding, plugins for logging and replication via Gearman, we’re getting better.

4. Be Solid

This will be a test for us. I think we should end up pretty good because of a number of reasons:

    • clearer, easier to understand code without nasty side effects or really odd things (e.g. relying on a bool storing the value 2)
    • Better modularity (a module you don’t use and don’t load cannot screw you up)
    • Smaller core and removal of problematic features.
    • All the testing stuff I’ve previously mentioned.

So I hope we’re going to be okay here.

5. Don’t forget about the roots

The group of us working at Sun on Drizzle have said we want to focus on being awesome for large scale Web apps while enabling others to make Drizzle good for other things. I think this is the right approach to not forget our roots (and target users) while allowing it to be awesome for any use somebody wants to have of it.

From a Storage Engine author PoV, Paul had some insights while thinking about PBXT:

1. A generic engine test suite

We’re doing pretty well… the whole Drizzle test suite runs with InnoDB, and doesn’t require much change to get going with another transactional engine. The proof is in the Drizzle PBXT branch! But I also think we could do better and have a test suite more directed at each part of an engine (including error cases!).

2. Internal APIs

Paul mentions FRMs, which are gone :) In their place is a simple interface that engines can implement for ever increased functionality (i.e. they own their own metadata). We’re getting better in other places too.

3. Customizable table and column attributes

MariaDB has this now, and we have space in our table definition proto, but not at the parser level (yet).

4. Push-down restrict and join conditions.

Not yet, and not for a little while.

5. Custom data types

It’ll be great when we rework the type system even more so that this really is as easy as it should be – not only from a SQL level but also for adding new types as server modules.

Finally, Antony’s wish list:

1. Modular Architecture

Covered.

2. libmysys as a separate project

We’ve removed it where we can, and are using gnulib where we can. It very much improves the situation when you ditch weird-ass platforms and assume some level of POSIX.

3. New/modular parser

We’re getting close to a stage where you could load a different parser… not there, but relatively close. It would still be messy, but a lot better than even 6 months ago.

4. Unit tests for server components

With our move towards modularity, this is actually getting possible!

5. Aggregate Stored functions and External Stored Procedures

We don’t currently have either, we have decent thoughts though.

Antony also cheated and added a few more:

A new Recursive Descent parser

The required work to be able to replace the parser is in progress.

Abstract Syntax Trees

See above.. getting the pre-work done.

SCTP and/or link aggregation

We’ll see improvements around this with the new drizzle protocol.

Parsing within the client

We’ve had some very good discussion on the drizzle-discuss list. We’ll no doubt have something to help remove more of the cycles used in executing a query.

Integrated Federation

That’s the game plan :)

Elimination of FRM Files

I never get tired of saying this is done :)

Elimination of errmsg.sys

We’re now just using gettext, like every other free software project on the planet. Although I think we could take a few steps in making errors more easily parsable by code.

So how do we stack up?

I think we’re doing pretty well. There’s still a lot of work to get where we want to be, but it’s amazing how much progress we’ve made in the short time we’ve been around.

I also just realized I missed Jay’s list… but we’re doing pretty well there too.

My Top 5 Wishlist for MySQL

I’m going and stealing Jay’s idea (who stole it off Brian Duff… but his was for Oracle so obviously doesn’t count :)

So, my five wishes for MySQL Are:

5. Six-monthly release cycles

Getting a release out there takes way too long. There’s a variety of reasons, but seeing the amazing success of other free software projects taking the shorter release cycle, with each release not being too ambitious, I’m pretty convinced.

Although I think our increased use of pushbuild has helped immensely with the general quality of the tree, there’s a lot more that can be done…

4. Much more in depth automated testing

For MySQL Cluster we have (in parts) an insanely detailed test suite… lots of error injection to test failure (or, more importantly, recovery), verifying data, transactional consistency etc. Also, tests that run multiple connections, threads, simultaneous transactions and test the results programatically (comparing strings is generally not very useful… even when it comes to something as simple as IPv4 or IPv6 address!).

Part of getting here is the SoC project I’m mentoring… which should have an output of some utilities for helping write better tests (rather like we have for NDB… but for SQL instead of NDB API).

3. Sane build system

Something that not just the build team knows how to produce a binary that’s released. The other side of this is a way to build your local working tree (and test it) on BLAH platform, without pushing.

3.5. (yes, i have a 3.5)

Kill HPUX. There’s a reason the name sounds like a disease.

2. Increased liberal use of asserts

This is just a wish from a bug I’ve been tracking down where the code to build an I_S table didn’t check that we opened all the tables successfully before calling handler::info()… which promptly goes fooey because the handler hasn’t opened the table. Luckily here it’s ended in a crash… IMNSHO it’s better to bail out on an assert than possibly return crap to the user…

1. Pluggable data dictionary

There are so many oddities and “bugs” around related to the casually strange mix of data dictionaries around the MySQL Server (and Federated, NDB, InnoDB, Falcon etc) that we could do a lot better if we had a “Virtual Data Dictionary Layer”… where engines who really do have a reason to not be simple (e.g. NDB) could plug in and the MySQL Server could get all the metadata consistently from the cluster (with out own form of sane-ish caching).

This would also solve the online ALTER TABLE frm consistency problem with engines that may roll back the alter table on crash recovery (or not support querying the index until it has finished being built).

Perhaps this is something for either the next engine summit or dev conf…