Things I’ve done in Drizzle

Posted on 2011-03-17 by Stewart Smith

When writing my Dropping ACID: Eating Data in a Web 2.0 Cloud World talk for LCA2011 I came to the realisation that I had forgotten a lot of the things I had worked on in MySQL and MySQL Cluster. So, as a bit of a retrospective as part of the Drizzle7 GA release, I thought I might try and write down a (incomplete) list of the various things I’ve worked on in Drizzle.

I noticed I did a lot of code removal, that’s all fine and dandy but maybe I won’t list all of that… except perhaps my first branch that was merged :)

2008

First ever branch that was merged: some mysys removal (use POSIX functions instead of wrappers that sometimes have different semantics than their POSIX functions), some removal of NETWARE, build scripts that weren’t helpful (i.e. weren’t what any build team ever used to build a release) and some other dead code removal.
Improve ‘make test’ time – transactions FTW! (this would be a theme for me over the years, I always want build and test to be faster)
Started moving functions out into their own plugins, removing the difference between UDFs (User Defined Functions) and builtin functions. One API to rule them all.
Ahhh compiler warnings (we now build with -Werror) and unchecked return codes from system calls (we now add this to some of our own methods, finding and fixing even more bugs). We did enable a lot of compiler warnings and OMG fix a lot of them.
Removal of FRM – use a protobuf message to describe the table. The first branch that implemented any of this was merged mid-November. It was pretty minimal, and we still had the FRM around for a little while yet – but it was the beginning of the end for features that couldn’t be implemented due to limitations in the FRM file format. I wrote a few blog entries about this.
A lot of test fixes for Drizzle
After relating the story ofÂ â˜ƒ.test (hint: it broke the ability to check out the source tree on Solaris) to Anthony Baxter, he suggested trying something rather nasty… a unicode character above 2^16 – I choseÂ ð„¢ – which at the time didn’t even render on Ubuntu – this was the test case you could only see correctly on MacOS X at the time (or some less broken Linux distro I guess). Some time later, I was actually able to view the test file on Ubuntu correctly.
I think it was November 1st when I started to work on Drizzle full time – this was rather awesome, although a really difficult decision as I did rather enjoy working with all the NDB guys.

2009:

January sparked the beginning of reading table information from the table protobuf message instead of the FRM file. The code around the FRM file was lovely and convoluted in places – to this day I’m surprised at the low bug count all that effort resulted in. One day I may write up a list of bugs and exploits probably available through the FRM code.
My hate for C++ fstream started in Feb 2009
In Feb I finally removed the code to work out (and store) in the FRM file how to display a set of fields for entering data into the table on a VT100 80×24 screen.
I filed my first GCC bug. The morning started off with a phone call andÂ Brian asking me to look at some strange bug and ended with Intel processor manuals (mmm… the 387 is fun), the C language standard and (legitimately) finding it was a real compiler bug. Two hours and two minutes after filing the bug there was a patch to GCC fixing it – I was impressed.
By the end of Feb, the FRM was gone.
I spent a bit of time making Drizzle on linux-sparc work. We started building with compiler options that should be much friendlier to fast execution on SPARC processors (all to do with aligning things to word boundaries). -Wcast-align is an interesting gcc flag
Moved DDL commands to be in StorageEngine rather than handler for an instance of an open table (now known as Cursor)
MyISAM and CSV as temporary table only engines – this means we save a bunch of mucking about in the server.
Amazingly enough, sprintf is dangerous.
Moved to an API around what tables exist in a database so that the Storage Engines can own their own metadata.
Move to have Storage Engines deal with the protobuf table message directly.
Found out you should never assume that your process never calls fork() – if you use libuuid, it may. If it wasn’t for this and if we had param-build, my porting of mtr2 to Drizzle probably would have gone in.
There was this thing called pack_flag – its removal was especially painful.
Many improvements to the table protobuf message (FRM replacement) code and format – moving towards having a file format that could realistically be produced by code that wasn’t the Drizzle database kernel.
Many bug fixes, some acused by us, others exposed by us, others had always been there.

2010:

embedded_innodb storage engine (now HailDB). This spurred many bug and API fixes in the Storage Engine interface. This was an education in all the corner cases of the various interfaces, where the obvious way to do things was almost always not fully correct (but mostly worked).
Engine and Schema custom KEY=VALUE options.
Found a bug in the pthread_mutex implementation of the atomics<> template we had. That was fun tracking down.
started writing more test code for the storage engine interface and upper layer. With our (much improved) storage engine interface, this became relatively easy to implement a storage engine to test specific bits of the upper layer.
Wrote a CREATE TABLE query that would take over four minutes to run. Fixed the execution time too. This only existed because of a (hidden and undocumented) limitation in the FRM file format to do with ENUM columns.
Characters versus bytes is an important difference, and one that not all of the code really appreciated (or dealt with cleanly)
FOREIGN KEY information now stored in the table protobuf message (this was never stored in the FRM).
SHOW CREATE TABLE now uses a library that reads the table protobuf message (this will later be used in the replication code as well)
Started the HailDB project
Updated the innobase plugin to be based on the current InnoDB versions.
Amazingly, noticed that the READ_COMMITTED isolation level was never tested (even in the simplest way you would ever explain READ_COMMITTED).
Created a storage engine for testing called storage_engine_api_tester (or SEAPITester). It’s a dummy engine that checks we’re calling things correctly. It exposed even more bugs and strangeness that we weren’t aware of.
Fixed a lot of cases in the code where we were using a large stack frame in a function (greater than 32kb).
An initial patch using the internal InnoDB API to store the replication log

2011:

This can be detailed later, it’s still in progress :)
The big highlight: a release.

Drizzle7

Posted on 2011-03-16 by Stewart Smith

We’ve released Drizzle7! Not only that, we’re now calling it Generally Available – a GA release.

What does this mean? What does this GA label mean?

You could view as a GA label being “we’re pretty confident people aren’t going to on mass ask for our heads when they start using it”… which isn’t a too bad description. We also plan to maintain it, there could be future releases in this series that just include bug fixes – we won’t just immediately tell you to go and use the latest tarball or bzr tree. This release series is a good one to use.

Drizzle7 is something that can be packaged in Linux distros. It’s no longer something where the best bet is to add the PPA and upgrade every two weeksÂ or build from source yourself. If you’re looking to deploy Drizzle (or develop against it) – you can rely on this release.

I’ll never use the words “production ready” to describe a release – it’s never up to me. It’s up to each person or organisation looking to deploy a piece of software to decide if that bit of software is production ready for them.

Personally, I’m looking forward to see how people can break it. While Drizzle isÂ the best tested FOSS SQL RDBMS server, I’m sure there’s new an interesting ways it can be broken by saying we’re ready for a much larger crowd to hammer on it.

Overall, I think we’ve managed to take the now defunct MySQL 6.0 tree (way back in 2008) and release something that can truly live up to the line “database for cloud”. Drizzle is modern, modular, rather solid and understandable. The future is bright, there is so much more to do to make the ultimate database for cloud. Drizzle7 is a great platform to build on – both for us (developers) and us (people who use relational databases).

Fixed in Drizzle: No more “GOTCHA’s”

Posted on 2011-03-16 by Stewart Smith

At the upcoming MySQL Conference and Expo, I’m going to give a Thursday afternoon (2pm) session entitled Fixed in Drizzle: No more “GOTCHA’s”. I plan to have a lot of fun with this session..

If you go back to the very start of when I started submitting code to Drizzle (June 2008) – I was going and fixing some of my favourite “gotcha’s” inside the code: BUILD/ scripts that didn’t build the way releases would, wrappers on POSIX functions with different (and inconsistent) semantics, NETWARE support, a non thread safe client lib, my_errno (different to errno) etc. I won’t really be talking about internals like this – it may give me a happy but really isn’t the latest awesome in databases.

I’ll instead be going over the way more awesome user and DBA visible things we’ve fixed/added/removed from Drizzle that make it a database with as few GOTCHA’s as possible.

Authentication (pluggable,Â LDAP), Logging (to syslog, gearman),Â DATA_DICTIONARY,Â INFORMATION_SCHEMA, engines owning their own metadata,Â STRICTÂ mode by default, removing global mutexes, improving the Storage EngineÂ API, improving the replication log, including code such asÂ PBXTÂ andÂ PBMSÂ Blob Streaming, filesystem engine (read files from disk like a table), pluggable protocol,Â UTF8Â by default,Â ENUMÂ data type, auto_increment behaviour.

All this and more is â€œFixed in Drizzleâ€.

(oh, and there’s no 24bit integer or a BLOB that can only be 255 bytes)

Undocumented ALTER TABLE that does nothing (useful)

Posted on 2011-03-03 by Stewart Smith

(at least since MySQL 5.1.42)

alter table t1 force;

Pretty neat huh? In fact, in Drizzle this will end up doing a copying alter table. Not useful.

There’s an over four year old bug report in MySQL (Bug#24091).

I’m just going to remove that bit from the parser in Drizzle – it makes no sense.

SQL Oddity: ALTER TABLE and default values

Posted on 2011-03-03 by Stewart Smith

ï»¿So, the MySQL (and Drizzle) ALTER TABLE syntax allows you to easily change the default value of a column. For example:

CREATE TABLE t1 (answer int);
ALTER TABLE t1 ALTER answer SET DEFAULT 42;

So, you create a TIMESTAMP column and forgot to set the default value to CURRENT_TIMESTAMP. Easy, just ALTER TABLE:

create table t1 (a timestamp);
alter table t1 alter a set default CURRENT_TIMESTAMP;

(This is left as another exercise for the reader as to what this will do – again, maybe not what you expect)

ALTER TABLE RENAME RENAME RENAME

Posted on 2011-03-01 by Stewart Smith

Here’s a nice challenge for you. What does the following do (or error out on?):

CREATE TABLE t1 (a int);
CREATE TABLE t2 (b int);
ALTER TABLE t1 RENAME t3, RENAME t2, RENAME t4;

I’d be interested to know what a) you think it does and then b) if you were surprised when you went and typed it into your RDBMS of choice.

Timing queries in the 21st century (with LD_PRELOAD and sed)

Posted on 2011-02-08 by Stewart Smith

So… Baron blogged about wanting higher precision timers from the mysql binary and that running sed on the binary wasn’t cutting it. However… I am not one to give up that easily!

This is what LD_PRELOAD was made for! Evil nasty hacks to make your life easier!

By looking at the mysql.cc source code, I can easily work out how this works… I just have to override two calls! They being sysconf() (we fake how many ticks per second there are) and times() (let’s return a much higher precision number).

Combined with the sed hack on the binary to change the sprintf call to print out the higher precision number, we have:

mysql> select count(*) from t1;
+----------+
| count(*) |
+----------+
|   710720 |
+----------+
1 row in set (1.080110 sec)

Get it from my junkcode: http://www.flamingspork.com/junkcode/high_precision_mysql_timer.c

(or you can get it from http://paste.drizzle.org/show/364/)

Drizzle metadata tables

Posted on 2011-02-08 by Stewart Smith

Giuseppe has a great post about the Evolution of MySQL metadata, and I thought I’d have a look at what we have in Drizzle. It’s pretty easy to work out how many tables are in each schema, we just query the standard INFORMATION_SCHEMA.TABLES view:

drizzle> select table_schema,count(table_name)
    ->  from information_schema.tables
    -> group by table_schema;
+--------------------+-------------------+
| table_schema       | count(table_name) |
+--------------------+-------------------+
| DATA_DICTIONARY    |                53 |
| INFORMATION_SCHEMA |                20 |
+--------------------+-------------------+
2 rows in set (0 sec)

In Drizzle it’s important to note that there is a differentiation between SQL Standard INFORMATION_SCHEMA tables (found in the INFORMATION_SCHEMA schema) and the extensions and extra information available from Drizzle that is Drizzle specific (found in DATA_DICTIONARY). Since I know that the PostgreSQL version I have on my laptop (8.4) also implements INFORMATION_SCHEMA, I can run this query there as well:

stewart=# select table_schema,count(table_name)
 from information_schema.tables
 group by table_schema;
    table_schema    | count
--------------------+-------
 information_schema |    55
 pg_catalog         |    78
(2 rows)

If we had written the query to the Drizzle DATA_DICTIONARY tables, it only may have been portable – and as we can see, certainly wouldn’t have run unmodified on PostgreSQL. Personally, I really like this feature, and wish more systems did something like it.

Implicit COMMIT considered harmful.

Posted on 2011-02-04 by Stewart Smith

If you execute the following, what does your RDBMS do?

CREATE TABLE t1 (a int);
START TRANSACTION;
INSERT INTO t1 (a) VALUES (1);
START TRANSACTION;
INSERT INTO t1 (a) VALUES (2);
ROLLBACK;
SELECT * FROM t1;

The answer may surprise you.

Is your Storage Engine buggy or the database server?

Posted on 2011-01-05 by Stewart Smith

If your storage engine returns an error from rnd_init (or doStartTableScan as it’s named in Drizzle) and does not save this error and return it in any subsequent calls to rnd_next, your engine is buggy. Namely it is buggy in that a) an error may not be reported back to the user and b) everything may explode horribly when rnd_next is called after rnd_init returned an error.

Unless it is running on MariaDB 5.2 or (soon, when the patch hits the tree) Drizzle.

Monty (ï»¿ï»¿Widenius, not Taylor) wrote a patch for MariaDB based on my bug report that addressed that problem. It uses the compiler feature to throw a warning if the result of a function isn’t checked to make sure that all places that call rnd_init are checking for an error from the engine.

Today I (finally) pulled that into Drizzle as well.

So… if your engine does the logical thing and goes “oh look, this method returns an error… I’ll return my error” it will exhibit bugs in MySQL but not MariaDB 5.2 or Drizzle (when patch hits).

Which is buggy, the server or the engine?

The MySQL bug number is 54166, filed in June 2010.

Persistent index statistics for InnoDB

Posted on 2010-12-13 by Stewart Smith

In browsing the BZR tree for lp:mysql-server, I noticed some rather exciting code had been merged into the Innobase code.

You may be aware that InnoDB will do some index dives when opening a table to get some statistics about the indexes that can help the optimiser make good query plans.

The problem being that this is many disk seeks. It means that on server restart, you have to spend a whole bunch of time seeking around the disk reading index pages.

Not any more.

There is now code merged in to store the calculated statistics in a table inside InnoDB so that these index dives don’t have to happen on startup.

Originally, this looked like it was going to make it into InnoDB+. The good news is that it’s now in a public source tree. I look forward to when it hits a stable release.

(hopefully somebody other than me can beat me to it and write a nice description of the algorithms involved… the code is pretty easy to follow, so it shouldn’t be hard)

A more complete look at Storage Engine API

Posted on 2010-11-29 by Stewart Smith

Okay… So I’ve blogged many times before about the Storage Engine API in Drizzle. This API is somewhat inherited from MySQL. We have very much attempted to make it a much cleaner interface. Our goals in making changes include: make it much easier to write and maintain a storage engine, make the upper layer code obviously correct and clear in what it’s doing and being able to more easily introduce optimisations.

I’ve recently added a Storage Engine that is only used in testing: storage_engine_api_tester. I’ve blogged on it producing call graphs (really state transition graphs) before both for Storage Engine and Cursor.

I’ve been expanding the test. My test engine is now a wrapper around a real engine instead of just a fake one. This lets us run real queries (and test cases) while testing what’s going on. At some point in the near future I plan to make it so that it will be able to log what calls go on to the engine and produce a graph just of those.

I added a lot more to the Storage Engine part of the wrapper. Below is what you can see is the current graph:

I’ve coded what I consider to be bugs as red and what I consider suspect as blue.

Also for the Cursor (colours mean the same):

As you can see, there’s currently some wacky possibilities. I’m investigating exactly what’s going on here – If I’m somehow missing some calls that I should be wrapping (I don’t think so) or if we are really doing some dumb-ass things in the upper layer.

Also, please do not be under any impression that any of this means that we’re going to have a stable API. We’re not. To stabilise on this would just be insane – way too much of it still makes not much sense.

Cursor states

Posted on 2010-10-26 by Stewart Smith

Following on from my post yesterday on the various states of a Storage Engine, I said I’d have a go with the Cursor object too. A Cursor is used by the Drizzle kernel to get and set data in a table. There can be more than one cursor open at once, and more than one per thread. If your engine cannot cope with this, it is its responsibility to figure it out and return the appropriate errors.

Let’s look at a really simple operation, inserting a couple of rows and then reading them back via a full table scan.

Now, this graph is slightly incomplete as there is no doEndTableScan() call. But you can see in which order things are meant to happen. In this case, “store_lock()” means that store_lock() has been called, so when coming back from doInsertRecord() we do not call store_lock() again, rather, we’re just in a state where it has already been executed.

For MySQL handler, think ::write_row() for doInsertRecord() and ::rnd_init() for doStartTableScan().

This diagram was again auto-generated from my test engine.

Storage Engine API state graph

Posted on 2010-10-25 by Stewart Smith

Drizzle still has a number of quirks inherited from the MySQL Storage Engine API (e.g. BLOBs, row buffer, CREATE SELECT and lack of DDL transaction boundaries, key tuple format). One of the things we fixed a long time ago was to have proper methods for StorageEngines to be called for: startTransaction, startStatement, endStatement, commit and rollback.

If you’ve had to implement a transactional storage engine in MySQL you will be well aware of the pattern of “in every Storage Engine/handler call: if transaction doesn’t exist, begin.” We’ve tried to fix this in the Drizzle API for a number of reasons. I think having this obvious set of calls will make the API a lot easier to understand. I am also very interested in making things much easier to prove correct.

A while ago I spotted Bug 587772, which was the READ COMMITTED isolation level not working correctly with InnoDB. It turns out that the most basic example for READ COMMITTED failed. Hrrm… this is no good. It worked on MySQL, so this was certainly something that we broke. What was more worrying is that there wasn’t a test for this in the test suite (and at the time I couldn’t find one in the MySQL test suite either, so I think we inherited the missing test).

I recently started delving in, actually going to solve this. I noticed something worrying, endStatement wasn’t being called, which is where the innobase plugin would release the read view that it used for the statement. You’d think that it would grab a new one on startStatement, but because of the previous design of the API (remember “if txn isn’t started, start it!”) this alsoÂ happenedÂ for getting the read view for the statement… so we instead got a REPEATABLE READ isolation level.

I wanted a test.

Previously, I’ve created a dummy storage engine (tableprototester) and used it to test the server code for reading the table protobuf message. I thought about doing a Storage Engine for this problem too, basically looking at the calls to the Storage Engine as transitions between states in a state machine.

A basic view of a transaction could be:

That is, a transaction starts and has zero or more statements before it commits or gets rolled back.

By coding up a data structure of allowable state transitions, a small function to assert() on invalid transitions and enough of the boilerplate to make the engine “work”, I was able to hit an assert() exactly where I’d expected it: at an invalid transition from START STATEMENT to COMMIT.

To fix the initial bug (READ COMMITTED not working), I filled in a few state transitions for the system as a whole that aren’t quite correct. From the diagram below, you can quite obviously see where the obvious bugs are (it helps that I’ve coloured them red):

There is absolutely no sense in going BEGIN -> END STATEMENT or immediately to COMMIT. These should be relatively easy to solve too, but are separate bugs.

I wish to expand this in the future to cover Cursor as well. It will also be useful to ensure that DDL can be wrapped in transactions. Not to mention the last few HTON flags that exist (and should likely go away).

To generate the diagrams, I just wrote a little utility to dump out the state transitions in dot, using it to generate the diagrams.

What was InnoDB+?

Posted on 2010-09-30 by Stewart Smith

Yes, I said InnoDB+ with a plus sign at the end (also see the first comment here).

Please note that this blog post is only based on public information. It has absolutely nothing in it that I only could have learned from back when I worked at Sun or MySQL AB. Everything has links or pointers to where you can find the information out on theÂ Internet and all thoughts are based on stringing these things together.

There was a lot of talk around theÂ acquisitionÂ of Sun Microsystems by Oracle about MySQL (MySQL AB was bought by Sun). Some of the talk centred around Oracle and their ability to make a closed source version of MySQL with added bits that wouldn’t be released as GPL. They’ve since proved that they’re quite willing to do this to an open source project (see OpenSolaris).

Relatively recently, a bunch of history from the old InnoDB SVN trees was imported into the MySQL source tree. You can pull the revision of the SVN tree as of InnoDB Plugin 1.0.6 release by using revid:ï»¿svn-v4:16c675df-0fcb-4bc9-8058-dcc011a37293:branches/zip:6263 Â from the MySQL repository – or just use a branch I’ve put up on launchpad for it (lp:~stewart/haildb/innodb-1.0.6-from-svn).

The first revision from the SVN tree was created on 2005-10-27, which you may remember was not too long after Oracle acquired Innobase on the 7th of October that year. The next two revisions were importing the 5.0 innodb code base, and then the 5.1 code base. Previous history can be found according to this blog post on Transactions on InnoDB.

According to Monty in the comment on the Pythian blog:

Oracle did work on a closed source version of InnoDB, codename InnoDB+, but they never released it, probably because our contract with them stopped them.

and from Eben Moglen’s letter to the EU Commission (via Baron Schwartz’s blog post):

Innobase could therefore have provided an enhanced version of InnoDB, like Oracleâ€™s current InnoDB+, under non-GPL license

Most tellingly is a lot of references in the revision history to “branches/innodb+” as well as this commit:

revno: 0.5.148
revision-id: svn-v4:16c675df-0fcb-4bc9-8058-dcc011a37293:branches/innodb%2B:6329
parent: svn-v4:16c675df-0fcb-4bc9-8058-dcc011a37293:branches/innodb%2B:6322
committer: vasil
timestamp: Thu 2009-12-17 11:00:17 +0000
message:
branches/innodb+: change name and version
Change name from “InnoDB Plugin” to “InnoDB+” and
version from 1.0.5 to 1.0.0.

So, from the revision history I’ve managed to work out that it likely was going to have the following features:

innodb_change_buffering (for values other than inserts)
See revid:svn-v4:16c675df-0fcb-4bc9-8058-dcc011a37293:branches/zip:4061
Or, more tellingly revid:svn-v4:16c675df-0fcb-4bc9-8058-dcc011a37293:branches/innodb%2B:4053
The latter tells about the merge of change buffering for delete-mark and delete in addition to the default of inserts.
Possibly compressed tables.
revid:svn-v4:16c675df-0fcb-4bc9-8058-dcc011a37293:branches/innodb%2B:2316 seems to show that it may have been copied across: “branches/innodb+: Copy from branches/zip r2315” in the comment. Â There’s a lot of other merges of branches/zip as well
Something named FTS
There is “branches/fts” in revid:svn-v4:16c675df-0fcb-4bc9-8058-dcc011a37293:branches/innodb%2B:2325 and revid:svn-v4:16c675df-0fcb-4bc9-8058-dcc011a37293:branches/innodb%2B:2324 Â (there’s an import of a red-black tree implementation)
If you also look at revid:Â svn-v4:16c675df-0fcb-4bc9-8058-dcc011a37293:branches/innodb%2B:6776
you’ll see references to a innofts+ branch with ha_innodb.cc in it.
So between a red-black tree and handler changes, this is surely something interesting.
Persistent statistics (also revid:Â svn-v4:16c675df-0fcb-4bc9-8058-dcc011a37293:branches/innodb%2B:6776)
Metrics Table (also revid:Â svn-v4:16c675df-0fcb-4bc9-8058-dcc011a37293:branches/innodb%2B:6776)
posix_fadvise() hints to temp files used in creating indexes (revid:svn-v4:16c675df-0fcb-4bc9-8058-dcc011a37293:branches/innodb%2B:2342 )
Improved recovery performance
See revid:svn-v4:16c675df-0fcb-4bc9-8058-dcc011a37293:branches/innodb%2B:2989
Talks about using the red-black tree for sorted insertion into the flush_list
native linux aio (revid:svn-v4:16c675df-0fcb-4bc9-8058-dcc011a37293:branches/innodb%2B:3913 )
group commit (revid:svn-v4:16c675df-0fcb-4bc9-8058-dcc011a37293:branches/innodb%2B:3923 )
New mutex to protect flush_list (revid:svn-v4:16c675df-0fcb-4bc9-8058-dcc011a37293:branches/innodb%2B:6330)

and finally, in revid:svn-v4:16c675df-0fcb-4bc9-8058-dcc011a37293:branches/innodb%2B:6819 you can see the change from “InnoDB+” back to “InnoDB” for being the built in default for MySQL 5.5

LCA Miniconf Call for Papers: Data Storage: Databases, Filesystems, Cloud Storage, SQL and NoSQL

Posted on 2010-09-29 by Stewart Smith

This miniconf aims to cover many of the current methods of data storage and retrieval and attempt to bring order to the universe. We’re aiming to cover what various systems do, what the latest developments are and what you should use for various applications.

We aim for talks from developers of and developers using the software in question.

Aiming for some combination of: PostgreSQL, Drizzle, MySQL, XFS, ext[34], Swift (open source cloud storage, part of OpenStack), memcached, TokyoCabinet, TDB/CTDB, CouchDB, MongoDB, Cassandra, HBase….. and more!

Call for Papers open NOW (Until 22nd October).

Update on “A Tale Of a Bug”

Posted on 2010-08-05 by Stewart Smith

The bug I talked about a little while ago has now also had the fix I wrote committed to the mysql-trunk 5.5.6-m3 repository.

HOWTO screw up launching a free software project

Posted on 2010-07-28 by Stewart Smith

Josh Berkus gave a great talk at linux.conf.au 2010 (the CFP for linux.conf.au 2011 is open until August 7th) entitled “How to destroy your community” (lwn coverage). It was a simple, patented, 10 step program, finely homed over time to have maximum effect. Each step is simple and we can all name a dozen companies that have done at least three of them.

Simon Phipps this past week at OSCON talked about Open Source Continuity in practice – specifically mentioning some open source software projects that were at Sun but have since been abandoned by Oracle and different strategies you can put in place to ensure your software survives, and check lists for software you use to see if it will survive.

So what can you do to not destroy your community, but ensure you never get one to begin with?

Similar to destroying your community, you can just make it hard: “#1 is to make the project depend as much as possible on difficult tools.”

#1 A Contributor License Agreement and Copyright Assignment.

If you happen to be in the unfortunate situation of being employed, this means you get to talk to lawyers. While your employer may well have an excellent Open Source Contribution Policy that lets you hack on GPL software on nights and weekends without a problem – if you’re handing over all the rights to another company – there gets to be lawyer time.

Your 1hr of contribution has now just ballooned. You’re going to use up resources of your employer (hey, lawyers are not cheap), it’s going to suck up your work time talking to them, and if you can get away from this in under several hours over a few weeks, you’re doing amazingly well – especially if you work for a large company.

If you are the kind of person with strong moral convictions, this is a non-starter. It is completely valid to not want to waste your employers’ time and money for a weekend project.

People scratching their own itch, however small is how free software gets to be so awesome.

I think we got this almost right with OpenStack. If you compare the agreement to the Apache License, there’s so much common wording it ends up pretty much saying that you agree you are able to submit things to the project under the Apache license. Â This (of course) makes the entire thing pretty redundant as if people are going to be dishonest about submitting things under the Apache licnese there’s no reason they’re not going to be dishonest and sign this too.

You could also never make it about people – just make it about your company.

#2 Make it all about the company, and never about the project

People are not going to show up, do free work for you to make your company big, huge and yourself rich.

People are self serving. They see software they want only a few patches away, they see software that serves their company only a few patches away. They see software that is an excellent starting point for something totally different.

I’m not sure why this is down at number three… it’s possibly the biggest one for danger signs that you’re going to destroy something that doesn’t even yet exist…

#3 Open Core

This pretty much automatically means that you’re not going to accept certain patches for reasons of increasing your own company’s short term profit. i.e. software is no longer judged on technical merits, but rather political ones.

There is enough politics in free software as it is, creating more is not a feature.

So when people ask me about how I think the OpenStack launch went, I really want people to know how amazing it can be to just not fuck it up to begin with. Initial damage is very, very hard to ever undo. The number of Open Source software projects originally coming out of a company that are long running, have a wide variety of contributors and survive the original company are much smaller than you think.

PostgreSQL has survived many companies coming and going around it, and is stronger than ever. MySQL only has a developer community around it almost in spite of the companies that have shepherded the project. With Drizzle I think we’ve been doing okay – I think we need to work on some things, but they’re more generic to teams of people working on software in general rather than anything to do with a company.

A tale of a bug…

Posted on 2010-07-22 by Stewart Smith

So I sometimes get asked if we funnel back bug reports or patches back to MySQL from Drizzle. Also, MariaDB adds some interest here as they are a lot closer (and indeed compatible with) to MySQL. With Drizzle, we have deviated really quite heavily from the MySQL codebase. There are still some common areas, but they’re getting rarer (especially to just directly apply a patch).

Back in June 2009, while working on Drizzle at Sun, I found a bug that I knew would affect both. The patch would even directly apply (well… close, but I made one anyway).

So the typical process of me filing a MySQL bug these days is:

Stewart files bug
In the next window of Sveta being awake, it’s verified.

ThisÂ happenedÂ within a really short time.

Unfortunately, what happens next isn’t nearly as awesome.

Namely, nothing. For a year.

So a year later, I filed it in launchpad for MariaDB.

So, MariaDB is gearing up for a release, it’s a relatively low priority bug (but it does have a working, correct and obvious patch), within 2 months, Monty applied it and improved the error checking around it.

So MariaDB bug 588599 is Fix Committed (June 2nd 2010 – July 20th 2010), MySQL Bug ï»¿ï»¿45377 is still Verified (July 20th 2009 – ….).

(and yes, this tends to be a general pattern I find)

But Mark says he gets things through… so yay for him.2

ENUM now works properly (in Drizzle)

Posted on 2010-06-29 by Stewart Smith

Over at the Drizzle blog, the recent 2010-06-07 tarball was announced. This tarball release has my fixes for the ENUM type, so that it now works as it should. I was quite amazed that such a small block of code could have so many bugs! One of the most interesting was the documented limit we inherited from MySQL (see the MySQL Docs on ENUM) of a maximum of 65,535 elements for an ENUM column.

This all started out from a quite innocent comment of Jay‘s in a code review for adding support for the ENUM data type to the embedded_innodb engine. It was all pretty innocent… saying that I should use a constant instead of the magic 0x10000 number as a limit on an assert for sanity of values getting passed to the engine. Seeing as there wasn’t a constant already in the code for that (surprise number 1), I said I’d fix it properly in a separate patch (creating a bug for it so it wouldn’t get lost) and the code went in.

So, now, a few weeks after that, I got around to dealing with that bug (because hey, this was going to be an easy fix that’ll give me a nice sense of accomplishment). A quick look in the Field_enum code raised my suspicions of bugs… I initially wondered if we’d get any error message if a StorageEngine returned a table definition that had too many ENUM elements (for example, 70,000). So, I added a table to the tableprototester plugin (a simple dummy engine that is loaded for testing the parsing of specially constructed table messages) that had 70,000 elements for a single ENUM column. It didn’t throw an error. Darn. It did, however, have an incredibly large result for SHOW CREATE TABLE.

Often with bugs like this I may try to see if the problem is something inherited from MySQL. I’ll often file a bug with MySQL as well if that’s the case. If I can, I’llÂ sometimes attach the associated patch from Drizzle that fixes the bug, sometimes with a patch directly for and tested on MySQL (if it’s not going to take me too long). If these patches are ever applied is a whole other thing – and sometimes you get things like “each engine is meant to have auto_increment behave differently!” – which doesn’t inspire confidence.

But anyway, the MySQL limit is somewhere between 10850 and 10900. This is not at all what’s documented. I’ve filed the appropriate bug (Bug #54194) with reproducible test case and the bit of problematic code. It turns out that this is (yet another) limit of the FRM file. The limit is “about 64k FRM”. The bit of code in MySQL that was doing the checking for the ENUM limit was this:

/* Hack to avoid bugs with small static rows in MySQL */
  reclength=max(file->min_record_length(table_options),reclength);
  if (info_length+(ulong) create_fields.elements*FCOMP+288+
      n_length+int_length+com_length > 65535L || int_count > 255)
  {
    my_message(ER_TOO_MANY_FIELDS, ER(ER_TOO_MANY_FIELDS), MYF(0));
    DBUG_RETURN(1);
  }

So it’s no surprise to anyone how this specific limit (the number of elements in an ENUM) got missed when I converted Drizzle from using an FRM over to a protobuf based structure.

So a bunch of other cleanup later, a whole lot of extra testing and I can pretty confidently state that the ENUM type in Drizzle does work exactly how you think it would.

Either way, if you’re getting anywhere near 10,000 choices for an ENUM column you have no doubt already lost.

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: