With JSON functionality, alternate protocols (HTTP, memcache), a move towards saner defaults and crash safety, pluggable logging etc it really looks like MySQL is following what we did in Drizzle years ago, which is great!
I want to emphasize this for those who have not spent time near trademarks: trademarks are trouble and another one of those things where no matter what, the lawyers always win. If you are starting a company or an open source project, you are going to have to spend a whole bunch of time with lawyers on trademarks or you are going to get properly, properly screwed.
MySQL AB always held the trademark for MySQL. There’s this strange thing with trademarks and free software, where while you can easily say “use and modify this code however you want” and retain copyright on it (for, say, selling your own version of it), this does not translate too well to trademarks as there’s a whole “if you don’t defend it, you lose it” thing.
The law, is, in effect, telling you that at some point you have to be an arsehole to not lose your trademark. (You can be various degrees of arsehole about it when you have to, and whenever you do, you should assume that people are acting in good faith and just have not spent the last 40,000 years of their life talking to trademark lawyers like you have).Basically, you get to spend time telling people that they have to rename their product from “MySQL Headbut” to “Headbut for MySQL” and that this is, in fact, a really important difference.
You also, at some point, get to spend a lot of time talking about when the modifications made by a Linux distribution to package your software constitute sufficient changes that it shouldn’t be using your trademark (basically so that you’re never stuck if some arse comes along, forks it, makes it awful and keeps using your name, to the detriment of your project and business).
If you’re wondering why Firefox isn’t called Firefox in Debian, you can read the Mozilla trademark policy and probably some giant thread on debian-legal I won’t point to.
Of course, there’s ‘ MySQL trademark policy and when I was at Percona, I spent some non-trivial amount of time attempting to ensure we had a trademark policy that would work from a legal angle, a corporate angle, and a get-our-software-into-linux-distros-happily angle.
So, back in 2010, Monty started talking about a draft MariaDB trademark policy (see also, Ubuntu trademark policy, WordPress trademark policy). If you are aiming to create a development community around an open source project, this is something you need to get right. There is a big difference between contributing to a corporate open source product and an open source project – both for individuals and corporations. If you are going to spend some of your spare time contributing to something, the motivation goes down when somebody else is going to directly profit off it (corporate project) versus a community of contributors and companies who will all profit off it (open source project). The most successful hybrid of these two is likely Ubuntu, and I am struggling to think of another (maybe Fedora?).
Linux is an open source project, RedHat Enterprise Linux is an open source product and in case it wasn’t obvious when OpenSolaris was no longer Open, OpenSolaris was an open source product (and some open source projects have sprung up around the code base, which is great to see!). When a corporation controls the destiny of the name and the entire source code and project infrastructure – it’s a product of that corporation, it’s not a community around a project.
From the start, it seemed that one of the purposes of MariaDB was to create a developer community around a database server that was compatible with MySQL, and eventually, to replace it. MySQL AB was not very good at having an external developer community, it was very much an open source product and not a an open source project (one of the downsides to hiring just about anyone who ever submitted a patch). Things struggled further at Sun and (I think) have actually gotten better for MySQL at Oracle – not perfect, I could pick holes in it all day if I wanted, but certainly better.
When we were doing Drizzle, we were really careful about making sure there was a development community. Ultimately, with Drizzle we made a different fatal error, and one that we knew had happened to another open source project and nearly killed it: all the key developers went to work for a single company. Looking back, this is easily my biggest professional regret and one day I’ll talk about it more.
Brian Aker observed (way back in 2010) that MariaDB was, essentially, just Monty Program. In 2013, I did my own analysis on the source tree of MariaDB 5.5.31 and MariaDB 10.0.3-ish to see if indeed there was a development community (tl;dr; there wasn’t, and I had the numbers to prove it).If you look back at the idea of the Open Database Alliance and the MariaDB Foundation, actually, I’m just going to quote Henrik here from his blog post about leaving MariaDB/Monty Program:
When I joined the company over a year ago I was immediately involved in drafting a project plan for the Open Database Alliance and its relation to MariaDB. We wanted to imitate the model of the Linux Foundation and Linux project, where the MariaDB project would be hosted by a non-profit organization where multiple vendors would collaborate and contribute. We wanted MariaDB to be a true community project, like most successful open source projects are – such as all other parts of the LAMP stack.
The reality today, confirmed to me during last week, is that:
Those in charge at Monty Program have decided to keep ownership of the MariaDB trademark, logo and mariadb.org domain, since this will make the company more valuable to investors and eventually to potential buyers.
Now, with Monty Program being sold to/merged into (I’m really not sure) SkySQL, it was SkySQL who had those things. So instead of having Monty Program being (at least in theory) one of the companies working on MariaDB and following the Hacker Business Model, you now have a single corporation with all the developers, all of the trademarks, that is, essentially a startup with VC looking to be valuable to potential buyers (whatever their motives).
Again, I’m going to just quote Henrik on the us-vs-them on community here:
Some may already have observed that the 5.2 release was not announced at all on mariadb.org, rather on the Monty Program blog. It is even intact with the “us vs them” attitude also MySQL AB had of its community, where the company is one entity and “outside community contributors” is another. This is repeated in other communication, such as the recent Recently in MariaDB newsletter.
This was, again, back in 2010.
More recently, Jeremy Cole, someone who has pumped a fair bit of personal and professional effort into MySQL and MariaDB over the past (many) years, asked what seemed to be a really simple question on the maria-discuss mailing list. Basically, “What’s going on with the MariaDB trademark? Isn’t this something that should be under the MariaDB foundation?”
The subsequent email thread was as confusing as ever and should be held up as a perfect example about what not to do. Some of us had by now, for years, smelt something fishy going on around the talk of a community project versus the reality. At the time (October 2013), Rasmus Johansson (VP of Engineering at SkySQL and Board Member of MariaDB foundation) said this:
The MariaDB Foundation and SkySQL are currently working on the trademark issue to come up with a solution on what rights to the trademark each entity should have. Expect to hear more about this in a fairly near future.
MariaDB has from its beginning been a very community friendly project and much of the success of MariaDB relies in that fact. SkySQL of course respects that.
(and at the same time, there were pages that were “Copyright MariaDB” which, as it was pointed out, was not an actual entity… so somebody just wasn’t paying attention). Also, just to make things even less clear about where SkySQL the corporation, Monty Program the corporation and the MariaDB Foundation all fit together, Mark Callaghan noticed this text up on mariadb.com:
The MariaDB Foundation also holds the trademark of the MariaDB server and owns mariadb.org. This ensures that the official MariaDB development tree<https://code.launchpad.net/maria> will always be open for the MariaDB developer community.
So…. there’s no actual clarity here. I can imagine attempting to get involved with MariaDB inside a corporation and spending literally weeks talking to a legal department – which thrills significantly less than standing in lines at security in an airport does.
So, if you started off as yay! MariaDB is going to be a developer community around an open source project that’s all about participation, you may have even gotten code into MariaDB at various times… and then started to notice a bit of a shift… there may have been some intent to make that happen, to correct what some saw as some of the failings of MySQL, but the reality has shown something different.
Most recently, SkySQL has renamed themselves to MariaDB. Good luck to anyone who isn’t directly involved with the legal processes around all this differentiating between MariaDB the project, MariaDB Foundation and MariaDB the company and who owns what. Urgh. This is, in no way, like the Linux Foundation and Linux.
Personally, I prefer to spend my personal time contributing to open source projects rather than products. I have spent the vast majority of my professional life closer to the corporate side of open source, some of which you could better describe as closer to the open source product end of the spectrum. I think it is completely and totally valid to produce an open source product. Making successful companies, products and a butt-ton of money from open source software is an absolutely awesome thing to do and I, personally, have benefited greatly from it.
MariaDB is a corporate open source product. It is no different to Oracle MySQL in that way. Oracle has been up front and honest about it the entire time MySQL has been part of Oracle, everybody knew where they stood (even if you sometimes didn’t like it). The whole MariaDB/Monty Program/SkySQL/MariaDB Foundation/Open Database Alliance/MariaDB Corporation thing has left me with a really bitter taste in my mouth – where the opportunity to create a foundation around a true community project with successful business based on it has been completely squandered and mismanaged.
I’d much rather deal with those who are honest and true about their intentions than those who aren’t.
My guess is that this factored heavily into Henrik’s decision to leave in 2010 and (more recently) Simon Phipps’s decision to leave in August of this year. These are two people who I both highly respect, never have enough time to hang out with and I would completely trust to do the right thing and be honest when running anything in relation to free and open source software.
Maybe WebScaleSQL will succeed here – it’s a community with a purpose and several corporate contributors. A branch rather than a fork may be the best way to do this (Percona is rather successful with their branch too).
Of course, The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions. Also, these numbers should be considered preliminary, but trust me – I did get them and it’s not April 1st.
From my last post, you saw that with my preliminary patch for MySQL 5.7 to work on POWER, we could easily match the previous record for sysbench point select queries per second (i.e. key lookups). In fact, we could exceed the published record by a little bit which is kind of nice. At around 630kQPS, one could be rather happy.
But we still had 30-40% idle CPU on POWER8. This led me to file the following bug report:
- Bug 72829: LOCK_grant is major contention point, leaves 30-40% idle CPU.
What’s going on is that there’s a rwlock in the MySQL Server that ensures that writers don’t collide with readers to the data structures describing the GRANTs (i.e. who has access to what). If you run a GRANT statement, it gets a writer lock, and nobody can read (i.e. check permissions) while everything is being updated. If you run a normal SQL statement, you get a read lock (non-exclusive) and can check permissions appropriately.
It’s been known for a long time that LOCK_grant was a bottleneck. Typically, some people have run with skip-grant-tables to help shorten the time the lock as held (as in MySQL you still take the mutex even though you’ve started the server with skip-grant-tables).
In Drizzle, we fixed that – moving authentication and authorization completely behind plugin APIs and if you didn’t load plugins for them, you executed near enough to zero instructions that it didn’t matter.
In my experiments, enabling skip-grant-tables actually hurt performance rather than helped. More investigation is needed, but it seems that simply the act of acquiring and releasing the rdlock is now a major bottleneck in some benchmarks (such as sysbench point select).
It turns out that this is a well known problem in other pieces of software (e.g. Linux kernel) and is pretty much what RCU (Read Copy Update) is best at. As far back as 2006 I remember attempting to get my head around RCU so that one day we could use it in MySQL or MySQL Cluster.
Another simpler method is simply splitting the mutex, with readers able to acquire any one of N mutexes and writers needing to acquire them all. This penalizes writers, but unless you’re executing a lot of GRANTs, you’re probably safe.
So… what is the theoretical maximum performance if this bottleneck went away?
I wrote a quick patch that just commented out the rdlock acquisition of LOCK_grant in the hot codepath of sysbench point selects. I wasn’t running GRANT statements at runtime so this was “safe”.
This patch is not production ready, it’s merely useful for demonstrating where we could be with MySQL 5.7 on POWER8 if one last bottleneck is fixed.
My results? Slightly over ONE MILLION QUERIES PER SECOND!
This is roughly twice the previous record.
This is with a dual socket 24 core POWER8 with SMT8 and DSCR=1 on 8 tables with sysbench 0.4.8. Sysbench itself is using a non-trivial amount of CPU and I could probably decently beat this number if I rewrote sysbench using the nonblocking API in libdrizzle (back when me made the Drizzle performance regression tests use a libdrizzle-ified sysbench we got double digit percentage improvement in our sysbench numbers).
There’s still around 7-10% idle CPU time… so there’s more room to grow.
Lacking a physical gauntlet to throw down, I’ll just have to submit a conference paper somewhere so that I can do that in person.
I really hope that we’re able to fix this bottleneck in MySQL 5.7 so that MySQL 5.7 will ship being able to do over a million queries per second. From SQL.
There’s a pattern I keep seeing in threaded programs (or indeed multiple processes) writing to a common log file. This is more of an antipattern than a pattern, and is often found in code that has existed for years.
Basically, it’s having a mutex to control concurrent writing to the log file. This is something you completely do not need.
The write system call takes care of it all for you. All you have to do is construct a buffer with your log entry in it (in C, malloc a char or have one per thread, in C++ std::string may do), open the log file with O_APPEND and then make a single write() syscall with the log entry.
This works for just about all situations you care about. If doing multi megabyte writes (a single log entry with multiple megabytes? ouch) then you may get into trouble on some systems and get partial writes (IIRC it may have been MacOS X and 8MB) and O_APPEND isn’t exactly awesome on NFS.
But, if what you’re wanting to do is implement something like a general query log, a slow query log or something like that, then you probably want to use this trick rather than, say, taking a pthread_mutex lock while you do malloc(), snprintf() and write(2).
When refactoring parts of Drizzle, we found this done the wrong way in a whole bunch of places in the MySQL server, largely explaining why things like the slow query log and general query log were such a huge drain on database server performance.
It’d be neat to see someone fix that.
It may not be surprising that there’s been a few projects over the years that I’ve worked on where we’ve had to care about stack usage (to varying degrees).
For threaded userspace applications (e.g. MySQL, Drizzle) you get a certain amount of stack per thread – and you really don’t want to bust that. For a great many years now, there’s been both a configuration parameter in MySQL to set how much stack each thread (connection) gets as well as various checks in the source code to ensure there’s enough free stack to do a particular operation (IIRC open_table is the most hairy one of this in MySQL).
For the Linux Kernel, stack usage is a relatively (in)famous problem… although by now just about every real problem has been fixed and merely mentioning it is probably just the influence of the odd grey beard hairs I’m pretending not to notice.
In a current project I’m working on, it’s also something we have to care about.
It turns out that GCC has a few nice things to help you prevent unbounded stack usage or runaway stack usage. There’s two warnings you can enable.
There’s -Wstack-usage=len which will throw warnings on unbounded stack usage (e.g. array on stack sized based on an argument to the function), where stack usage is greater than len and when stack usage may exceed len.
There’s also -Wframe-larger-than=len which is based on calculation for a particular stack frame, as opposed to -Wstack-usage=len, which could be based on several stack frames.
Odds are, you may get some warnings in your project if you set this to what you would consider “conservative” values. Now, if this is every going to explode at runtime is something that’s left as an exercise for the reader, but enabling these warnings is pretty easy and a simple way to help find and prevent some issues.
After all, having your software explode for running off the end of the stack is just a tad embarrassing.
Since we had decided that Drizzle was going to be UTF-8 everywhere,(after seeing for years how hard it was for people to get character sets correct in MySQL) we soon added ☃.test to the tree, which tried a few interesting things:
CREATE TABLE ☃; CREATE DATABASE ☃; etc etc
Because what better to show off UTF-8 than using odd Unicode characters for table names, database names and file names. Well… it turns out we were all good except if you attempted to check out the source tree on Solaris. It was some combination of Python, Bazaar and Solaris that meant you just got python stacktraces and no source tree. So, if you look now it’s actually snowman.test and has been since the end of 2008, because Solaris 10.
A little while later, I was talking to Anthony Baxter at OSDC in Sydney and he mentioned Unicode above 2^16 in UTF-8…. so, we had clef.test (we’d learned since ☃ and we were not going to tall it 𝄢.test).
Fast forward a few years to, well, this week, and I was talking to Jeremy Kerr about petitboot and telling the tail of snowman.test. So out came the crazy Unicode characters:
- U+1F4A9 PILE OF POO 💩
- U+1F435 MONKEY FACE 🐵
- U+1F431 CAT FACE 🐱
- U+1F602 FACE WITH TEARS OF JOY 😂
- U+1F639 CAT FACE WITH TEARS OF JOY 😹
But guess what, there is no MONKEY FACE WITH TEARS OF JOY! I know, this is just unacceptable – TEARS OF JOY should be a modifier, because you may need U+1F6B9 MENS SYMBOL 🚹 with a TEARS OF JOY modifier at some point in your life.
Anyway, another place with tests involving odd Unicode characters is good for everyone, but still lacking if you need to boot an Operating System that’s MONKEY FACE WITH TEARS OF JOY.
Way back in May in the year 2000, a feature was added to MySQL that would keep many people employed for many years – replication. In 3.23.15 you could replicate from one MySQL instance to another. This is commonly cited as the results of two weeks of work by one developer. The idea is simple: create a log of all the SQL queries that modify the database and then replay them on a slave. Remember, this is before there was concurrency and everything was ISAM or MyISAM, so this worked (for certain definitions of worked).
The key things to remember about MySQL replication are: it was easy to use, it was easy to set up and it was built into the MySQL Server. This is why it won. You have to fast forward to September in 2010 before PostgreSQL caught up! It was only with PostgreSQL 9.0 that you could have queryable read-only slaves with stock standard PostgreSQL.
If you want to know why MySQL was so much bigger than PostgreSQL, this built in and easy to use replication was a huge reason. There is the age of a decent scotch between read-only slaves for MySQL and PostgreSQL (although I don’t think I’ve ever pointed that out to my PostgreSQL friends when having scotch with them… I shall have to!)
In 2001, when space was an odyssey, the first GA (General Availability) release of MySQL 3.23 hit the streets (quite literally, this was back in the day of software that came in actual physical boxes, so it quite probably was literally hitting the streets).
For a good piece of trivia, it’s 3.23.22-beta that is the first release in the current bzr tree, which means that it was around this time that BitKeeper first came into use for MySQL source code.
We also saw the integration of InnoDB in 2001. What was supremely interesting is that the transactional storage engine was not from MySQL AB, it was from Innobase Oy. The internals of the MySQL server were certainly not set up for transactions, and for many years (in fact, to this day) we talk about how a transactional engine was shoehorned in there. Every transactional engine since has had to do the same odd things to, say, find out when a transaction was being started. The exception here is in Drizzle, where we finally cleaned up a bunch of this mess.
Having a major component of the MySQL server owned and controlled by another company was an interesting situation, and one that would prove interesting in a few years time.
We also saw Mårten Mickos become CEO in 2001, a role he would have through the Sun acquisition – an acquisition that definitively proved that you can build an open source company and sell it for a *lot* of money. It was also the year that saw MySQL AB accept its first round of VC funding, and this would (of course) have interesting implications: some good, some less ideal.
(We’ll continue tomorrow with Part 3!)
I have put up a set of scripts on github: https://github.com/stewartsmith/bzr-to-git-conversion-scripts. Why do I need these? Well… if only bzr fast-export|git fast-import worked flawlessly for large, complex and old trees. It doesn’t.
Basically, when you clone this repo you can run “./sync-BLAH.sh” and it’ll pull BZR trees for the project, convert to git and clean things up a bit. You will likely have to edit the sync-BLAH.sh scripts as I have them pointed at branches on my own machine (to speed up the process, not having to do fresh BZR branches of MySQL trees over the network is a feature – it’s never been fast.). You’ll also want to edit the git remotes to point where you want git trees to end up.
I’ve done it for:
- Bazaar itself (sync-bzr.sh)
- Drizzle (sync-drizzle.sh)
- libeatmydata (sync-libeatmydata.sh)
- MySQL (sync-mysql.sh)
- Percona Server (sync-ps.sh)
- Percona XtraDB Cluster (sync-pxc.sh)
- Percona XtraBackup (sync-xb.sh)
What problems did I hit? Well… the first is performance, things are slow unless you tweak a bunch of knobs, and then it’s just rather slow rather than slow. So in the empty git repo I set core.compression=1, which makes zlib a whole lot faster.
I naturally give the correct incantation to bzr fast-export to munge tag names appropriately, set a git branch name (each BZR branch ends up as a git branch) and use a marks file (this speeds up incremental syncs).
For one of these branches I was importing, BZR had allowed the invalid committer of “billy-earney email@example.com\n <>” – yes, a newline in the committer. This messes up the fast-import format so I have to run the entire fast-export output through sed to clean it up.
We then use bzr fast-import-filter to apply a user map – which is me looking at the appropriate committers and cleaning them up so that we get better attribution in the resulting git trees as well as cleaning up some errors in the bzr tree so that Git likes them (most notably, missing < or (not and) > around email addresses). The user map is fairly Percona specific, but there’s at least one or two for Oracle committers too.
Next, I pass the output through pv(1) – to do two things: monitor the output to see that it’s still going, and to have a transfer buffer so that git fast-import doesn’t stall waiting for output – amazingly enough, this gave a decent speed boost to import speed.
Finally, when we’re done doing the import of all of the revisions for all of the bzr branches, if this is our first run, we set the HEAD ref to the last BZR branch name and then do a git repack. Through experimentation, I’ve found that “git repack -AdfF –depth=100 –window=500” is what gives me the smallest size possible.
I’ve used the Bazaar (bzr) version control system since roughly 2005. The focus on usability was fantastic and the team at Canonical managed to get the entire MySQL BitKeeper history into Bazaar – facilitating the switch from BitKeeper to Bazaar.
There were some things that weren’t so great. Early on when we were looking at Bazaar for MySQL it was certainly not the fastest thing when it came to dealing with a repository as large as MySQL. Doing an initial branch over the internet was painful and a much worse experience than BitKeeper was. The work-around that we all ended up using was downloading a tarball of a recent Bazaar repository and then “bzr pull” to get the latest. This was much quicker than letting bzr just do it. Performance for initial branch improved a lot since then, but even today it’s still not great – but at least it isn’t terrible like it once was.
The integration with Launchpad was brilliant. We never really used it for MySQL but for Drizzle the combination was crucial and helped us get releases out the door, track tasks and bugs and do code review. Parts of launchpad saw great development (stability and performance improved immensely) and others did not (has anything at all changed in blueprints in the past 5+ years?). Not running your own bugs db was always a win and I’m really sad to say that I still think Launchpad is the best bug tracker out there.
For both Drizzle and Percona, Bazaar was the right option as it was what MySQL was using, so people in the community already knew the tools. These days however… Git is the tool that there’s large familiarity with – even to the extent that Twitter maintains their MySQL branch in Git rather than in bzr.Is Bazaar really no longer being developed? Here are graphs (from github actually) on the activity on Bazaar itself over the years:You can easily see the drop off in commits and code changes. The last commit to trunk was 2 months ago and although there was the 2.6.0 release in August, in my opinion it wasn’t a very strong one (the first one I’ve had problems with in years).So… git is the obvious successor and with such a strong community around GitHub, it kinda makes sense. I’m not saying that GitHub has caught up to Launchpad in terms of features or anything – it’s just that with Bazaar clearly no longer really being developed…. it may be the only option.In fact, in my experiment of putting a mirror of Percona Server on GitHub, we already have a pull request mere days after I blogged about it. Migrating all of Percona development over to Git and Github may take some time, but it’s certainly time that we kicked the tyres on it and worked out how we’d do it without interrupting releases or development.I’ve also thrown up a Drizzle tree and although it required some munging to get the conversion to happen, I’m kind of optimistic about it and I think that after a round of merging things, I’m tempted to very strongly advocate for us switching (which I don’t think there’ll be any opposition to).When will Oracle move over their MySQL development? This I cannot say (as I don’t know and don’t make that call for them). There is a lot of renewed interest in code contribution by Oracle and moving to Git and GitHub may well be a very good way to encourage people.
The downside of git? Well… With BZR you could get away with not understanding pretty much every single bit of the internals. With git, I wish I was so lucky.
I’ve been mirroring a bunch of projects that have their source control in BZR up onto github recently. This turns out to be a bit harder than it sounds for a bunch of reasons that aren’t particularly interesting (although having a commit in the bzr repo where the name of the committer has a newline in it is among the more interesting).
Run on over to https://github.com/stewartsmith/drizzle to check it out. I’ve put up Drizzle 7.0, 7.1 and 7.2 branches.
There was some suggestion after my previous post (Who works on MariaDB and MySQL?) that I look at MariaDB 10.0 – so I have. My working was very simple, in a current MariaDB 10.0 BZR tree (somewhat beyond 10.0.3), I ran the following command:
bzr log -n0 -rtag:mariadb-10.0.0..|egrep '(author|committer): '| \ sed -e 's/^\s*//; s/committer: //; s/author: //'| \ sort -u|grep -iv oracle
- Alexander Barkov
- Alexey Botchkov
- Daniel Bartholomew
- Elena Stepanova
- Igor Babaev
- Jani Tolonen
- Michael Widenius
- Sergei Golubchik
- Sergey Petrunya
- Sergey Vojtovich
- Vladislav Vaintroub
- Kentoku SHIBA (4 commits)
- Lixun Peng (1 commit)
- Olivier Bertrand (212 commits)
From Oracle (i.e. revisions merged from Oracle MySQL):
- 81 names (which I won’t list here as 81 is a lot)
The results are no different if you go back to the first revision that is different between MariaDB 5.5 and 10.0 (found using bzr missing). Even when grepping through the bzr log for things such as “patch by”, “contribution” or “originally” I can only find 1 or two more names as original authors for patches (about the same as I can for patches going into the Oracle tree).
Please point me to revisions (revid is best way) that come from outside contributors as then I really can update this to show that there’s a larger developer community.
The current development version of Drizzle (7.2) has just as many contributors as the MariaDB development version (10.0) – although Drizzle does have fewer commits.
I recently got pointed towards https://github.com/shodanium/nanomysql/ which is a tiny (less than 400 lines of C++) MySQL client library which is GPL licensed.
If you need to link into non-GPL compatible code, there is the (slightly larger and full featured) libdrizzle library. But if you want something *tiny* and are okay with GPL, then nanomysql may be something to look at.
There’s a big difference in how plugins are treated in MySQL and how they are treated in Drizzle. The MySQL way has been to create a C API in front of the C++-like (I call it C- as it manages to take the worst of both worlds) internal “API”. The Drizzle way is to have plugins be first class citizens and use exactly the same API as if they were inside the server.
This means that MySQL attempts to maintain API stability. This isn’t something worth trying for. Any plugin that isn’t trivial quickly surpasses what is exposed via the C API and has to work around it, or, it’s a storage engine and instead you have this horrible mash of C and C++. The byproduct of this is that no core server features are being re-implemented as plugins. This means the API is being developed in a vacuum devoid of usefulness. At least, this was the case… The authentication plugin API seems to be an exception, and it’s interesting to note that semisync replication is in fact a plugin.
So times may be changing… sort of. Yesterday I noted that some storage engine API features are only available if you’re InnoDB and I’ve voiced my general disappointment in the audit API being unsuitable to implement various forms of query logging already in the server (general query log, slow query log).
One thing to note: when the API is the same for both inside the server and a plugin, it makes initial refactoring very easy, and you quickly see the bits that could be improved.
I recently wrote about Where are they now: MySQL Storage Engines and The MERGE storage engine: not dead, just resting…. or forgotten. Today, it’s the turn of the MEMORY storage engine – otherwise known as HEAP.
This is yet another piece of the MySQL server that sits largely unmaintained and unloved. The MySQL Manual even claims that it supports encryption… with the caveat of having to use the SQL functions for encryption/decryption rather than in the engine itself (so, basically, it supports encryption about as much as every other engine does).
The only “recent” innovation in the MEMORY engine was the dynamic row patch that ended up making its way into Percona Server (and isn’t enabled by default). This forced me to go and look at the code of the MEMORY engine again and I cannot possibly drink enough in my lifetime to erase the memory.
The MEMORY engine is used by just about everybody as you probably have a SQL query somewhere that uses an in memory temporary table. I can, however, feel the comments being added to this post right now by people who use gdb to set server variables that not a single query in their systems use MEMORY….. (IIRC there have been some patches around that would throw an error rather than create a temporary table)
We had a early version of the dynamic row format patch in Drizzle for a while… and if you turned it on, all sorts of things horrifically broke. It was a remarkably non-trivial amount of work to get that code to work properly and this is largely a testament to the “design” of the MEMORY engine.
While it may be efficient or fast or something (likely on 1990s hardware and workloads), it misses the boat completely on the things that matter today: simultaneous access, MVCC, BLOB/TEXT columns and transactions. Basically, it’s a engine that’s really only useful for a single connection in limited use cases…. and even then, it’s likely a good way to ruin things. MyISAM is better as at least on memory pressure things may be written out to disk sensibly…. and if InnoDB had a “don’t log this table” mode it would beat that absolute pants off it.
It is, again, another part of the MySQL server that’s remarkably hard to pull out and replace with something different/better. Why? Well, I wrote about it before: Refactoring Internal temporary tables (another stab at it). If it was easy, we’d likely have Tokyo Cabinet (via BlitzDB) or similar (some bit of code maintained by other people) doing the same job in Drizzle rather than this large chunk of code that nobody really cares about.
Following on from my fun post on Where are they now: MySQL Storage Engines, I thought I’d cover the few storage engines that are really just interfaces to a collection of things. In this post, I’m talking about MERGE.
The MERGE engine was basically a multiplexer down to a number of MyISAM tables. They all had to be the same, there was no parallel query execution and it saw fairly limited use. One of the main benefits was that then you could actually put more rows in a MyISAM table than your “files up to 2/4GB” file system allowed. With the advent of partitioning, this really should have instantly gone away and been replaced by it. It wasn’t.
It is another MySQL feature that exists likely due to customer demand at the time. It’s not a complete solution by any means, PARTITIONING is way more complete and universal…. and much harder to get right inside the MySQL server – which is why MERGE exists. It was easier to write a storage engine that wrapped MyISAM than it was to have any form of partitioning in the server.
One advantage of MERGE tables is it means that you could parallelize myisamchk to repair your broken MyISAM tables after a crash. One step better than no crash safety is at least parallel recovery. The disadvantage being that you’re using MERGE and MyISAM tables.
There is also the great security problem of MRG_MYISAM (the other name for MERGE tables): if you create a MyISAM table t1 and have a user able to access it, if they can create a MERGE table that accesses t1 (say m1) and you then revoke their access to t1, they’ll still be able to access t1 through m1.
MERGE still seems to exist in MySQL 5.6 without even a warning that it’ll go away… which I suspect it will…. we long since got rid of it in Drizzle as, well, what you really want is a query rewrite engine that does views, partitioning etc etc.
Can anyone think of a reason why you should still use MERGE tables in 2013? I can’t.
There was once a big hooplah about the MySQL Storage Engine Architecture and how it was easy to just slot in some other method of storage instead of the provided ones. Over the years I’ve repeatedly mentioned how this wasn’t really the case and that it was remarkably non trivial.
Over the years there have been many storage engines crop up and then disappear. So… where are they now?
This became MyISAM…. you know you’ve been around MySQL a long time if you’ve ever had to deal with an ISAM table.
This was the first big test of the GPL in court. Basically, you have to obey the GPL (see wikipedia for more info). The code was released as GPL and development stopped. This has been dead since ca 2002.
- Amira – http://launchpad.net/amira
Antony first mentioned this in 2008 on his blog. This was a continuation of the Gemini engine, you can actually go over to launchpad and get the code. This was one of the projects to have a transactional storage engine not owned by Oracle after Innobase Oy was acquired by them. It went nowhere special as Netfrastructure was acquired which became Falcon.
otherwise known as the BerkeleyDB engine. It was seldom used and never gained much of a userbase. It was unceremoniously dropped back in 2006 and both users didn’t really exist.
- PBXT – http://pbxt.blogspot.com/
I think we can credit PBXT with at least half of the features and performance improvements to InnoDB since it first emerged back in 2006. It got attention very quickly. Why? Because it was different. It had the very rare ability to outperform InnoDB in some places. You can still find PBXT in MariaDB, but sadly it can be hard to fund development of a MySQL storage engine, especially one as tied to MySQL as PBXT is, and it’s no longer under active development. Closely related was the Blob Streaming project which was way ahead of its time as an AlsoSQL access method. The good news is that the code was released under a BSD license in 2012 (was previously GPL). We even had PBXT in Drizzle for a while.
- Blob Streaming (PBMS) – http://bpbdev.blogspot.com/
This project was closely related to (but not depending exclusively on) PBXT. It embedded a HTTP server inside the database and could use it to read and write BLOBs. This was not only fairly cool but way ahead of its time. We owe the existence of both HandlerSocket and the memcached interface to InnoDB to PBMS (it was also an inspiration for the JSON server plugin for Drizzle, to address some of the use cases of the PBMS plugin).
It’s still there… but is effectively unmaintained and dead. There’s even FederatedX in MariaDB which is an improvement, but still, the MySQL server really doesn’t lend itself kindly to this type of engine… it’s always been an oddity only suitable for very specific tasks.
Although useful, effectively unmaintained. I kinda don’t want to say dead… but if it went away, I wouldn’t exactly be surprised.
Currently used to access the log tables in MySQL… and hardly used otherwise. It’s odd that the same code doesn’t deal with SELECT INTO OUTFILE and LOAD DATA INFILE, and I doubt this will ever change. I’d say effectively niche/dead.
Purchased by IBM, abandoned.
Only ever on System i. Useful for very very few people… but you can still find it around if you’re one of them.
OMG it exists! This is probably because they’re largely just using the MySQL server as a way to implement the MySQL network protocol and all of the heavy lifting is done by their own code.
I’m quite surprised these guys are still around, as they’re a proprietary storage engine as a service, and initial testing wasn’t entirely promising.
I cannot emphasize how much more interesting TokuDB would be if it were open source. It actually holds some promise… and with their recent work with mongo, perhaps this is a good way forward for them…
Another “OMG Oracle just bought Innobase Oy” engine. This was a project to take MyISAM and turn it into a lean, mean, transactional storage engine machine. It’s still not there and I don’t think it ever will be.
This was the hot new thing. It came out of Netfrastructure, which MySQL AB acquired in order to help get a transactional storage engine after Innobase Oy was acquired by Oracle. If you’re keeping count, that’s three projects for a transactional storage engine. Falcon was the star though, receiving all the press and publicity (well before it was ready). There are many reasons why Falcon isn’t around today – the chief one probably being that Oracle bought Sun who had bought MySQL and thus a need for an “InnoDB replacement” instantly vanished. There was also immense management pressure for performance to be greater than InnoDB, without any allowance for or focus on correctness…. and this showed. This was quite disappointing as Falcon had a lot of good architectural things going for it.
- BlitzDB – https://launchpad.net/blitzdb
I had hoped we’d replace MyISAM with BlitzDB in Drizzle. It was a wrapper around Tokyo Cabinet to the storage engine API in Drizzle. Unfortunately, the ties to MyISAM are incredibly deep (see my recent post on internal temporary tables) and we never quite got there.
I think this is all the notable engines that were aimed at widespread adoption… what ones have I forgotten?
It’s interesting to note that only Archive, CSV, Xeround, TokuDB and Infobright can be gotten anywhere, and the latter two only in their own distribution (one proprietary) and Xeround only as a service.
While Domas may have rather effictively trolled the discussion with his post on howto configure table/user statistics (which gave me a good chuckle I do have to say), it’s at least incorrect for Percona Server as you have to enable the “userstat” server option :)
That being said, once enabled there are no extra configuration variables to think about. This is a huge advantage over configuring PERFORMANCE_SCHEMA – which has a total of THIRTY configuration options (31 if you include the global enable/disable option).
Some of these thirty odd configuration variables are only going to matter if you’re loading your own plugins, and even then, it’s probably only going to matter if they use the MySQL mutex implementations rather than, say, the standard pthread ones or even other synchronization primitives. It helps that the vast majority of non-InnoDB storage engines are dead. Go on – name one that’s in any form of usage (MyISAM doesn’t count – it’s effectively on death row).
This really makes me want to go and resurrect and finish the perf integration with Drizzle. The operating system provides a whole bunch of performance monitoring tools already, just expose them via SQL and be done with it.
I’m really glad to come back from vacation to discover that we’ve been accepted into GSoC 2013. If you’re interested in being a student or mentoring, head over to: Drizzle Database in GSOC 2013
Those interested in hacking on Drizzle for the Google Summer of Code this year should certainly jump on the mailing list and IRC channel and work out what they may want to do. I (and others) are happy to supervise this year.
Drizzle is a relational database server with an accessible C++ code base that has been involved with GSoC for a number of years now.
A few weekends ago, I started to again look at the code in Drizzle for producing internal temporary tables. Basically, we have a few type of tables:
- Temporary (from CREATE TEMPORARY TABLE)
- Temporary (from ALTER TABLE)
- Internal temporary (to help with query execution)
If you’re lucky enough to be creating one of the first three types, you go through an increasingly lovely pile of code that constructs a nice protobuf message about what the table should look like and hands all responsibility over to the storage engine as to how to do that. The basic idea is that Drizzle gets the heck out of the way and lets the storage engine do its thing. This code path looks rather different than what we inherited from MySQL. For a start, we actually have a StorageEngine object rather than just lumping everything into the handler (which we correctly name a Cursor). However… the final part, the internal temporary table code is a bit closer to what we inherited from MySQL. There is a good reason for that, it’s ass.
For a start, the table::Singular object is still abused by Item_sum_distinct (see the setup() method) as a tuple (a table with no actual table). This is not ideal and just throws a spanner in the works for refactoring a bunch of code.
The second big problem is that create_tmp_table() doesn’t actually use any normal API calls, instead it manually sets up the table::Singular object. This includes setting up the fields for the table::Singular object in a slightly different way depending on which bit of code called create_tmp_table().
The third big problem is that it’s not storage engine agnostic. Instead of using any existing and sensible way to go and create a temporary table by using the storage engine API it instead creates a series of MI_COLUMNDEF structures which as you may be able to guess, are MyISAM specific and internal data structures.
The forth big problem is that if we end up using HEAP (again, like MyISAM, hard coded) we don’t even call the create table method on the engine. The HEAP (or MEMORY engine as it’s now known) is magic in that it can create tables on open()!
All of these issues make it really, really hard to have another engine with the ability to handle internal temporary tables. You may recall that MariaDB does include the ability to use the Aria engine for internal temporary tables. No, they did not refactor any of this code, they just made a copy of the code and put in Aria where MyISAM was along with some #ifdef for the feature.
Over the past several years I’ve tried a few times to tease this code out and start the process of turning it into something that is palatable. Every one of those times I’ve either failed or gotten sufficiently frustrated that I’ve given up.
I now have a new strategy though. After looking at the code for a good few hours a few weekends ago, I think I have an idea of where to start…. (now just for a few more free weekends to implement it).