Dogfooding a pastebin

July 3rd, 2009

http://pastebin.flamingspork.com/

A pastebin running Drizzle and  the Drizzle PHP Extension (which is on top of libdrizzle).

linux.conf.au 2010 announces first keynote speaker!

June 24th, 2009

An exciting announcement from the linux.conf.au 2010 team! Benjamin Mako Hill will be a keynote speaker. I’m rather excited now – Mako is a great speaker and I’m looking forward to LCA2010 even more (if that’s possible)!

Unwired and Australian Government Content Filtering Trial

June 24th, 2009

Just got an email from Unwired asking if I’d like to voluntarily join a trial. A censorship trial. The wonderful “you can’t know what you aren’t allowed to see” form of “trust me” democracy embraced by our current government.

I first used Unwired for the time it took for Telstra to recover from screwing me when I moved (and bringing the DSL connection with me). I’ve kept the device around to enable on-the-road net connection on occasion and as a backup to my DSL line.

I’ll now look for an alternative backup internet solution.

Kernel Conference Australia

June 12th, 2009

Earlybird prices are up until today – so if interested in OS kernels (or hack on one) and not exclulively Linux (i.e. are interested in other platforms) then head over to http://wikis.sun.com/display/KCA/ and have a look.

Drizzle pluggable MetadataStore (or: no table definition file on disk)

June 9th, 2009

My code is shaping up rather nicely (see https://code.launchpad.net/~stewart-flamingspork/drizzle/discovery) and I’m planning to submit a merge-request for it later today.

I’m about to commit code that implements a MetadataStore for the ARCHIVE engine. This means that for ARCHIVE tables, you only have the .ARZ file on disk. The table definition protobuf is stored in the ARZ during createTable() and the ARCHIVE MetadataStore can read it.

The StorageEngine now gets the drizzled::message::Table (i.e. the table definition protobuf) as a parameter. Eventually, we will fully be using this to tell the engine about the table structure (this is a work-in-progress). The advantages of using the proto as the standard way of passing around table definitions are numerous. I see it as almost essential to get this into the replication log for cross-DBMS replication.

We still have the default way of storing table metadata- in a table definition file (for MySQL it’s the FRM, for Drizzle it’s the table proto serialized into a file ending in ‘.dfe’). However, in my discovery branch if an engine provides its own MetadataStore, then it is the StorageEngine who is responsible for storing the table definition (either in it’s data file or data dictionary). It is also then responsible for making sure rename works and that the definition is cleaned up on drop table.

The MetadataStore provided by the StorageEngine is also used when searching for metadata such as for SHOW CREATE TABLE, SHOW TABLES, INFORMATION_SCHEMA, CREATE LIKE and when getting the table definition before opening the table.

The way the ARCHIVE MetadataStore works is that it reads the table proto out of the header of the ARZ file when asked for it. This has the side effect of now being able to copy ARZ files between servers and have it “just work”.

It will be really nice if we directly interface to the InnoDB Data Dictionary (or even just store the table protos in an InnoDB table manipulated in the same transaction as the DDL) as then we move a lot closer to closing a number of places where we (and MySQL) are not crash-safe.

Drizzle Tarballs for next milestone: aloha

June 9th, 2009

Wanting a quick build-and-play way to get Drizzle? We’re dropping weekly-ish tarballs for the Aloha milestone. The latest milestone also has preliminary GCC 4.4 support

You can see regular announcements on:

Pluggable Metadata stores (or… the revenge of table discovery)

May 27th, 2009

Users of the ARCHIVE or NDB storage engines in MySQL may be aware of a MySQL feature known as “table discovery”. For ARCHIVE, you can copy the archive data file around between servers and it magically works (you don’t need to copy the FRM). For MySQL Cluster (NDB) it works so that when you CREATE TABLE on another MySQL server,  other MySQL servers can get the FRM for these tables from the cluster.

With my work to replace the FRM with a protobuf structure in Drizzle and clean up parts of the API around it, this feature didn’t really survive in any working state.

Instead, I’m now doing things closer to the right way: pluggable metadata stores. The idea being that the whole “table proto on disk” (in MySQL it’s the FRM, but in Drizzle we’re now using a protobuf structure) code is pluggable and could be replaced by an implementation specific to an engine (e.g. the innodb or ndb data dictionaries) or a different gerenic one.

Currently, the default plugin is the same way we’ve been doing it forever: file-per-table on disk in a directory that’s the database. The API has a nasty bit now (mmmm… table name encoding), but that’ll be fixed in the future.

The rest of this week will be dedicated to plugging this into all the bits in the server that manipulate the files manually.

With luck, I’ll have modified the ARCHIVE engine by then too so that there’ll just be the archive data file on disk with the table metadata stored in it.

Is BIT_LENGTH() useful?

May 19th, 2009
mysql [localhost] {msandbox} ((none)) > select length(crc32(3)) * 8, bit_length(crc32(3));
+----------------------+----------------------+
| length(crc32(3)) * 8 | bit_length(crc32(3)) |
+----------------------+----------------------+
|                   80 |                   80 |
+----------------------+----------------------+
1 row in set (0.00 sec)

Save the Devil: it’s what the cool kids are doing

May 18th, 2009

At linux.conf.au and now Dreamhost are doing a $50 discount and $50 to the devil deal.

Money going to real research – on an infectious cancer that is fatal to the Devils.

We managed to raise an amazing amount of money at linux.conf.au for the Devils (expect a press release with the final tallies real-soon-now, as the last of the pledges is trickling into our bank account).

So save a cartoon character and if you haven’t already, head to tassiedevil.com.au to find out what you can do.

MySQL Storage Engine SLOCCount over releases

May 15th, 2009

For a bit more info, what about various storage engines over MySQL releases. Have they changed much? Here we’re looking at the storage/X/ directory for code, so for some engines this excludes the handler that interfaces with the MySQL Server.

You can view the data on the spreadsheet.

NDB Kernel size over releases

May 15th, 2009

So Jonas pointed out that the NDB kernel hasn’t changed too much in size over releases. Let’s have a look:

In fact, the size went down slightly from 4.1 to 5.0. In this, 6.4 and 7.0 are the same thing but appear twice for completeness.

You can see the raw results in the spreadsheet here.

Size of Storage Engines

May 15th, 2009

For whatever reason, let’s look at “Total Physical Source Lines of Code” from a recent mysql-6.0 tree (and PBXT from PBXT source repo):

See the spreadsheet here.

Raw data:

Blackhole        336
CSV             1143
Archive         2960
MyISAM         34019
PBXT           41732
Maria          69019
InnoDB         82557
Falcon         91158
NDB           365272

NDB has a 100,000 line test suite.

PBXT supports MySQL and Drizzle.

Conclusions to draw? Err… none really.

Congratulations Sheeri on having the book out!

May 14th, 2009

The MySQL Administrator’s Bible is out. Writing a book is not something you can just squeeze into a Sunday afternoon; it takes real dedication and more effort than you could possibly imagine.

So congrats on having the book for MySQL DBAs (and I’d venture to say application devs should also be reading it) out and on Amazon so people can buy it now.

MySQL Sandbox is awesome

May 12th, 2009

I’m surprised I haven’t used it before. I do have to say that MySQL Sandbox is incredibly awesome. In this case, it saved me waiting for a compile of MySQL 6.0, which is always a good thing.

Does linux fallocate() zero-fill?

May 9th, 2009

In an email disscussion for pre-allocating binlogs for MySQL (something we’ll likely have to do for Drizzle and replication), Yoshinori brought up the excellent point of that in some situations you don’t want to be doing zero-fill as getting up and running quickly is the most important thing.

So what does Linux do? Does it zero-fill, or behave sensibly and pre-allocate quickly?

Let’s look at hte kernel:

Inside the fallocate implementation (fs/open.c):

if (inode->i_op->fallocate)
ret = inode->i_op->fallocate(inode, mode, offset, len);
else
ret = -EOPNOTSUPP;

and for ext4:
/*
* currently supporting (pre)allocate mode for extent-based
* files _only_
*/
if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
return -EOPNOTSUPP;

XFS has always done fast pre-allocate, so it’s not a problem there and the only other filesystems to currently support fallocate are btrfs and ocfs2 – which we don’t even have to start worrying too much about yet :)

But this is just kernel behaviour – i *think* libc ends up wrapping it
with a ENOTSUPP from kernel being “let me zero-fill” (which might be
useful to check). Anybody want to check libc for me?

This was all on slightly post 2.6.30-rc3 (from git: 8c9ed899b44c19e81859fbb0e9d659fe2f8630fc)

c++ stl bitset only useful for known-at-compile-time number of bits

May 6th, 2009

Found in the libstdc++ docs:

Extremely weird solutions. If you have access to the compiler and linker at runtime, you can do something insane, like figuring out just how many bits you need, then writing a temporary source code file. That file contains an instantiation of bitset for the required number of bits, inside some wrapper functions with unchanging signatures. Have your program then call the compiler on that file using Position Independent Code, then open the newly-created object file and load those wrapper functions. You’ll have an instantiation of bitset for the exact N that you need at the time. Don’t forget to delete the temporary files. (Yes, this can be, and has been, done.)

Oh yeah – feel the love.

Brought to you by the stl-is-often-worse-for-you-than-meth dept.

Dodge Avenger furthers the stereotypes of American cars

April 26th, 2009

“Every time it goes around a corner you are going to die” would roughly be an accurate statement.

This is the car I’ve been driving for the past week. Absolutely no feeling in the steering at all (in fact less feeling when cornering).

It also bongs at you seemingly randomly. It especially likes not turning off the headlights when you get out of the car – even after you’ve locked it.

I’m sure it has torque… but only one of them.

Horsepower… sure, just not sure about plural. Foot to the floor and it kind of sits there wondering what you mean.

All this talk of “US carmakers failing” seems to be not a moment too soon.

I think I’m going to have to put a ban on driving American made cars…. except perhaps the Ford GT… I wonder if MySQL Sun Oracle would have a problem with that :)

Feedback from MySQL Cluster tutorial

April 26th, 2009

Way back on Monday (at the MySQL Conference and Expo), I gave a full day tutorial on MySQL Cluster. I awoke early in the morning to a “oh ha ha” URL in an IM; but no, it wasn’t jetlag playing tricks with me. Luckily, this didn’t take much (if anything) away from the purpose of the day: teaching people about NDB.

Distracting-and-this-time-really-annoying-thing-of-the-day-2: It seems that O’Reilly had cut back on power this year, and there were no power boards in the room. A full day interactive tutorial, and nowhere to plug in laptops. Hrrm.. Luckily, having over the many years I’ve been speaking at this event, I’ve gotten to know the AV guys okay, and asked them. They totally deserve a medal. Tutorial started at 8:30, I noticed at 7:30, and it was all fixed by 7:45. The front half of the room (enough for everyone coming) had enough power for everyone. It was quite okay to bunch everybody up – means I have to run around less.

This years tutorial was modified from last year (and that does take time, even though I’ve given it many times before). I wanted to remove out of date things, trim bits down (to better fit into the time we have, based on more experience on how long it takes to get interactive parts done) and add a bit.

When we got to the end of the day (yes, I ran over… and everybody stayed, so either I’m really scary or the material is really interesting) I pleaded for feedback. It’s amazingly scary doing an interactive tutorial. You’re placing the success of the session not so much on you, but on everyone who’s come to it.

Sometimes I’ve gotten not much feedback at all; this time was different. I spoke to a number of people afterwards (and some via email) and got some really good suggestions for small changes that would have greatly enhanced the day for them. I was pleased that they also really enjoyed the tutorial and liked the interactivity. I (and it seems a great many others) do not much like tutorials that are just long talks.

People walked out of my tutorial with a good overview of what MySQL Cluster was, how to set one up, use one, do a bit of admin and some of how it works.

I even dragged Jonas up to explain in great detail the 2 phase commit protocol for transactions. Of course, this is detail you don’t ever need to know to deploy – but people are intersted in internals.

So far the session has received an average of 4 stars in evaluations (four five star, two four star and one two star). I’d be really interested in feedback from the person who gave two stars, as this may mean I missed getting something done for them (e.g. providing information, help etc). Even though it is hard to spread yourself around a room of 60-ish-plus people, I do like to do it well. There is the other possibility of people not coming prepared, which will mean they may be bored for a lot of the day if they don’t jump in with another group and help learn that way.

So, I’m rather happy with how my first session went.

On my way to MySQL Conference and Expo, Drizzle Developer Day all in sunny California

April 17th, 2009

Well… Santa Clara – not as cool as California leads you to believe…. but it’ll be an awesome week. See you all there soon.

Drizzle low-hanging-fruit

April 16th, 2009

We have an ongoing Drizzle milestone called low-hanging-fruit. The idea is that when there’s something that  could be done, but we don’t quite have the time to do it immediately, we’ll add a low-hanging-fruit blueprint so that people looking to get a start on the codebase and contributing code to Drizzle have a place to go to find things to do.

Some of my personal favourites are:

Also relatively low hanging fruit can be writing some plugins. Some simple plugin types include:

  • Authentication
    Got somewhere that you could authenticate against for connecting to a DB? Write a plugin for it! Current auth plugins are auth_http and auth_pam.

    • Perhaps you want to authenticate against a central DB? checking in memcached first?
    • Perhaps a htaccess style method
  • Functions
    Apply some function to a column. These are pretty simple to write (see md5, compress examples). Perhaps interfaces to encryption/decryption? a hashing function?

    • ROT13
    • 3DES
    • AES
      Bonus points if you get any of these to use the T2000 crypto accellerator stuff
    • ID3 tag decoding
    • file type detection (well.. BLOB)

So there’s a fair bit you can do to get started. Best of all, you can chat with the Drizzle developers next week at the MySQL Conference and Expo and Drizzle Developer Day.