Table discovery for Drizzle (take 2, now merged!)

Table discovery looks a bit different from the previous time I blogged about it. Everything is now just hanging off the StorageEngine. If you want to not have dfe files on disk and just use your own data dictionary, you need to implement two things:

  • A method to get table metadata
  • A iterator over table names in a database in your engine

I’ve done this for the ARCHIVE storage engine (and that’s in Drizzle trunk now), and have been reading up on the Embedded InnoDB docs to see their API to the InnoDB data dictionary and am rather excited about getting it going at some point in the future (feel free to beat me to it and submit a patch though!)

MyISAM as temporary only engine

Finally merged into main. I added the ability for engines to be temporary only – that is you can only CREATE TEMPORARY table or be created and used during query execution. This allows us to refactor/remove some other code and go towards a “locking is inside the engine” mantra as anything but row level or true MVCC is certainly the exception these days.

Debian unstable on a Sun Fire T1000

So i got the T1000 working again (finally, after much screwing about trying to get the part). I then hit the ever annoying “no console” problem, where the console didn’t work – kind of problematic.

After a firmware upgrade, and passing “console=/dev/ttyS0” to the kernel, things work.

So the T1000 firmware 6.3 doesn’t work with modern debian kernels. Thing swork with 6.7 though.

Unwired and Australian Government Content Filtering Trial

Just got an email from Unwired asking if I’d like to voluntarily join a trial. A censorship trial. The wonderful “you can’t know what you aren’t allowed to see” form of “trust me” democracy embraced by our current government.

I first used Unwired for the time it took for Telstra to recover from screwing me when I moved (and bringing the DSL connection with me). I’ve kept the device around to enable on-the-road net connection on occasion and as a backup to my DSL line.

I’ll now look for an alternative backup internet solution.

Drizzle pluggable MetadataStore (or: no table definition file on disk)

My code is shaping up rather nicely (see https://code.launchpad.net/~stewart/drizzle/discovery) and I’m planning to submit a merge-request for it later today.

I’m about to commit code that implements a MetadataStore for the ARCHIVE engine. This means that for ARCHIVE tables, you only have the .ARZ file on disk. The table definition protobuf is stored in the ARZ during createTable() and the ARCHIVE MetadataStore can read it.

The StorageEngine now gets the drizzled::message::Table (i.e. the table definition protobuf) as a parameter. Eventually, we will fully be using this to tell the engine about the table structure (this is a work-in-progress). The advantages of using the proto as the standard way of passing around table definitions are numerous. I see it as almost essential to get this into the replication log for cross-DBMS replication.

We still have the default way of storing table metadata- in a table definition file (for MySQL it’s the FRM, for Drizzle it’s the table proto serialized into a file ending in ‘.dfe’). However, in my https://code.launchpad.net/~stewart/drizzle/discovery if an engine provides its own MetadataStore, then it is the StorageEngine who is responsible for storing the table definition (either in it’s data file or data dictionary). It is also then responsible for making sure rename works and that the definition is cleaned up on drop table.

The MetadataStore provided by the StorageEngine is also used when searching for metadata such as for SHOW CREATE TABLE, SHOW TABLES, INFORMATION_SCHEMA, CREATE LIKE and when getting the table definition before opening the table.

The way the ARCHIVE MetadataStore works is that it reads the table proto out of the header of the ARZ file when asked for it. This has the side effect of now being able to copy ARZ files between servers and have it “just work”.

It will be really nice if we directly interface to the InnoDB Data Dictionary (or even just store the table protos in an InnoDB table manipulated in the same transaction as the DDL) as then we move a lot closer to closing a number of places where we (and MySQL) are not crash-safe.

Drizzle Tarballs for next milestone: aloha

Wanting a quick build-and-play way to get Drizzle? We’re dropping weekly-ish tarballs for the Aloha milestone. The latest milestone also has preliminary GCC 4.4 support

You can see regular announcements on:

Pluggable Metadata stores (or… the revenge of table discovery)

Users of the ARCHIVE or NDB storage engines in MySQL may be aware of a MySQL feature known as “table discovery”. For ARCHIVE, you can copy the archive data file around between servers and it magically works (you don’t need to copy the FRM). For MySQL Cluster (NDB) it works so that when you CREATE TABLE on another MySQL server,  other MySQL servers can get the FRM for these tables from the cluster.

With my work to replace the FRM with a protobuf structure in Drizzle and clean up parts of the API around it, this feature didn’t really survive in any working state.

Instead, I’m now doing things closer to the right way: pluggable metadata stores. The idea being that the whole “table proto on disk” (in MySQL it’s the FRM, but in Drizzle we’re now using a protobuf structure) code is pluggable and could be replaced by an implementation specific to an engine (e.g. the innodb or ndb data dictionaries) or a different gerenic one.

Currently, the default plugin is the same way we’ve been doing it forever: file-per-table on disk in a directory that’s the database. The API has a nasty bit now (mmmm… table name encoding), but that’ll be fixed in the future.

The rest of this week will be dedicated to plugging this into all the bits in the server that manipulate the files manually.

With luck, I’ll have modified the ARCHIVE engine by then too so that there’ll just be the archive data file on disk with the table metadata stored in it.

Save the Devil: it’s what the cool kids are doing

At linux.conf.au and now Dreamhost are doing a $50 discount and $50 to the devil deal.

Money going to real research – on an infectious cancer that is fatal to the Devils.

We managed to raise an amazing amount of money at linux.conf.au for the Devils (expect a press release with the final tallies real-soon-now, as the last of the pledges is trickling into our bank account).

So save a cartoon character and if you haven’t already, head to tassiedevil.com.au to find out what you can do.

MySQL Storage Engine SLOCCount over releases

For a bit more info, what about various storage engines over MySQL releases. Have they changed much? Here we’re looking at the storage/X/ directory for code, so for some engines this excludes the handler that interfaces with the MySQL Server.

You can view the data on the spreadsheet.

NDB Kernel size over releases

So Jonas pointed out that the NDB kernel hasn’t changed too much in size over releases. Let’s have a look:

In fact, the size went down slightly from 4.1 to 5.0. In this, 6.4 and 7.0 are the same thing but appear twice for completeness.

You can see the raw results in the spreadsheet here.

Size of Storage Engines

For whatever reason, let’s look at “Total Physical Source Lines of Code” from a recent mysql-6.0 tree (and PBXT from PBXT source repo):

See the spreadsheet here.

Raw data:

Blackhole        336
CSV             1143
Archive         2960
MyISAM         34019
PBXT           41732
Maria          69019
InnoDB         82557
Falcon         91158
NDB           365272

NDB has a 100,000 line test suite.

PBXT supports MySQL and Drizzle.

Conclusions to draw? Err… none really.

Congratulations Sheeri on having the book out!

The MySQL Administrator’s Bible is out. Writing a book is not something you can just squeeze into a Sunday afternoon; it takes real dedication and more effort than you could possibly imagine.

So congrats on having the book for MySQL DBAs (and I’d venture to say application devs should also be reading it) out and on Amazon so people can buy it now.

Does linux fallocate() zero-fill?

In an email disscussion for pre-allocating binlogs for MySQL (something we’ll likely have to do for Drizzle and replication), Yoshinori brought up the excellent point of that in some situations you don’t want to be doing zero-fill as getting up and running quickly is the most important thing.

So what does Linux do? Does it zero-fill, or behave sensibly and pre-allocate quickly?

Let’s look at hte kernel:

Inside the fallocate implementation (fs/open.c):

if (inode->i_op->fallocate)
ret = inode->i_op->fallocate(inode, mode, offset, len);
else
ret = -EOPNOTSUPP;

and for ext4:
/*
* currently supporting (pre)allocate mode for extent-based
* files _only_
*/
if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
return -EOPNOTSUPP;

XFS has always done fast pre-allocate, so it’s not a problem there and the only other filesystems to currently support fallocate are btrfs and ocfs2 – which we don’t even have to start worrying too much about yet :)

But this is just kernel behaviour – i *think* libc ends up wrapping it
with a ENOTSUPP from kernel being “let me zero-fill” (which might be
useful to check). Anybody want to check libc for me?

This was all on slightly post 2.6.30-rc3 (from git: 8c9ed899b44c19e81859fbb0e9d659fe2f8630fc)