It’s coming up on a year since I started working full time on Drizzle. So, I got a bit reflective…
Have we done things that I (and others) really wanted done? Back in 2007, I wrote my top 5 wishlist for the MySQL Server.
I am not going to pretend I speak for the MySQL development team; I’m just trying to evaluate how Drizzle is doing against some wishlists that (to me) embodied some of the reasons we started Drizzle.
Please think of this as “database server wishlists” and comparing them against Drizzle….
My wishlist was:
5. Six-monthly release cycles
Done. Not only does Drizzle have milestone releases, but we’re also dropping tarballs every two weeks (currently for the bell milestone). We’re also doing a decent job of keeping trunk free of massive breakage.
4. Much more in depth automated testing
Done (and in progress). We have drizzle-automation running things all the time. Hudson (and buildbot) test across many platforms (what pushbuild did and more when inside MySQL) before code hits trunk. We also have regular performance benchmarks that we compare across versions, crash-me, the random query generator as well as checking that we don’t regress in code size (via sloccount).
3. Sane build system
(slightly distorting my original words). Well, we’re not quite at the ready for packaging in distributions stage, but a debug or non debug build is just -g and optimization level for the compiler, plugins are using autofoo to work otu if they can be built… so yeah, this is pretty sane.
We also are building with -Werror (and more!) which increases code quality no end.
So, mostly done.
3.5. (yes, i have a 3.5): Kill HPUX
2. Increased liberal use of asserts
An in-progress thing, but the better compiler warnings have won us a lot.
1. Pluggable data dictionary
Not only that, but done away with FRM totally. Really happy with this.
What about other peoples wishlist?
Kostya had one:
1. Remove excessive fuss.
i.e. “just do it”.
I think we’re doing really well with this for Drizzle. Plugins are pretty easy to get merged, and if your patches to the kernel are good, they’re also easy. Big changes can be harder, but in the end it has turned out well.
2. Open the development process.
Done. There is no internal wiki, there are no “committers” versus “non-committers”.. everything is judged on merit of the idea/code. Sometimes the most valuable contribution is somebody telling you their real world experience.
3. Get to a normal release schedule.
4. Establish productive relationship with the majority of users.
I think drizzle-discuss mailing list is doing quite well in this regard. Is quite active with discussion.
5. Find a way to do incompatible changes with minimal pain for users.
We’ll see how we go :)
1. Real time Query Monitoring
With gearman logging and my recent experimentation with using CPU performance counters I think we’ll end up somewhere rather awesome.
If you’re looking for MySQL monitoring though, the MySQl Enterprise query monitoring stuff looks pretty good to me.
2. Consistent Release Cycles
We’re doing pretty well so far!
3. INFORMATION_SCHEMA Extensions
We’ve inherited the architecture from MySQL 5.1 (and 6.0) of being able to pretty easily add INFORMATION_SCHEMA tables and improved it. It’s currently pretty easy to add them. We also have ongoing work having an INFORMATION_SCHEMA storage engine which means that you won’t have to have the I_S tables be materialized every time you query them.
4. Online table maintenance
All progress has been due to Storage Engine authors. With the data dictionary work though, this gets easier and saner to do.
5. Published benchmarks
We’re encouraging others who will be more objective :) Although we also do regular performance regression tests as part of our standard development process.
Dormando also had a list (complete with “there is no five”):
1) Logical separation of connections from threads
We have this in Drizzle through plugins. Interesting ones are pool_of_threads (fixed number of threads), multithread (thread per connection) and single_thread (one thread).
2) A more modular core
We’re very much doing well here. It’s a long process, but I’m quite impressed by our progress.
3) Better replication (better replication management/protocol?)
The work being done on Drizzle replication is really exciting. I love the fact that modularity is encouraged and the ability to replace any bit you want easily, as well as read the replication stream in about any language you want.
4) Better test suite
I may never be 100% happy with a test suite, but we’re doing good…
PeterZ‘s view is always interesting, and he had one too:
1. Be Pluggable
Check. (and also, of course, in progress)
2. Be Scalable
We’ve done a lot of work scaling on many CPUs and many connections. Really, 8 concurrent database connections just isn’t interesting. We ever run as part of our regression suite up to 2048 concurrent connections.
3. Be Distributed
With the new protocol work to have built in sharding, plugins for logging and replication via Gearman, we’re getting better.
4. Be Solid
This will be a test for us. I think we should end up pretty good because of a number of reasons:
- clearer, easier to understand code without nasty side effects or really odd things (e.g. relying on a bool storing the value 2)
- Better modularity (a module you don’t use and don’t load cannot screw you up)
- Smaller core and removal of problematic features.
- All the testing stuff I’ve previously mentioned.
So I hope we’re going to be okay here.
5. Don’t forget about the roots
The group of us working at Sun on Drizzle have said we want to focus on being awesome for large scale Web apps while enabling others to make Drizzle good for other things. I think this is the right approach to not forget our roots (and target users) while allowing it to be awesome for any use somebody wants to have of it.
From a Storage Engine author PoV, Paul had some insights while thinking about PBXT:
1. A generic engine test suite
We’re doing pretty well… the whole Drizzle test suite runs with InnoDB, and doesn’t require much change to get going with another transactional engine. The proof is in the Drizzle PBXT branch! But I also think we could do better and have a test suite more directed at each part of an engine (including error cases!).
2. Internal APIs
Paul mentions FRMs, which are gone :) In their place is a simple interface that engines can implement for ever increased functionality (i.e. they own their own metadata). We’re getting better in other places too.
3. Customizable table and column attributes
MariaDB has this now, and we have space in our table definition proto, but not at the parser level (yet).
4. Push-down restrict and join conditions.
Not yet, and not for a little while.
5. Custom data types
It’ll be great when we rework the type system even more so that this really is as easy as it should be – not only from a SQL level but also for adding new types as server modules.
Finally, Antony’s wish list:
1. Modular Architecture
2. libmysys as a separate project
We’ve removed it where we can, and are using gnulib where we can. It very much improves the situation when you ditch weird-ass platforms and assume some level of POSIX.
3. New/modular parser
We’re getting close to a stage where you could load a different parser… not there, but relatively close. It would still be messy, but a lot better than even 6 months ago.
4. Unit tests for server components
With our move towards modularity, this is actually getting possible!
5. Aggregate Stored functions and External Stored Procedures
We don’t currently have either, we have decent thoughts though.
Antony also cheated and added a few more:
A new Recursive Descent parser
The required work to be able to replace the parser is in progress.
Abstract Syntax Trees
See above.. getting the pre-work done.
SCTP and/or link aggregation
We’ll see improvements around this with the new drizzle protocol.
Parsing within the client
We’ve had some very good discussion on the drizzle-discuss list. We’ll no doubt have something to help remove more of the cycles used in executing a query.
That’s the game plan :)
Elimination of FRM Files
I never get tired of saying this is done :)
Elimination of errmsg.sys
We’re now just using gettext, like every other free software project on the planet. Although I think we could take a few steps in making errors more easily parsable by code.
So how do we stack up?
I think we’re doing pretty well. There’s still a lot of work to get where we want to be, but it’s amazing how much progress we’ve made in the short time we’ve been around.
I also just realized I missed Jay’s list… but we’re doing pretty well there too.