Archive for the ‘General’ Category

OpenOffice.org is the most frustrating piece of software I use

Saturday, August 21st, 2010

No, really.

I have recently been constructing a 100 page document going over a whole bunch of the details for the Monorail we’re building at Burning Man this year.

Apart from randomly freezing, and then suddenly not displaying images until I had restarted it – it’s also really slow.

The last straw was when leafing through the document before getting it printed. I had inserted a bunch of pages before this last section. But now, there was this empty page in the last section of the document.  The part that I hadn’t touched for days. If I tried to remove the blank page, all the images on nearby pages moved so that they were on top of each other.

I ended up just printing it. There is a blank page that I can’t get rid of.

It is a piece of software that worries me. Is this really meant to be an alternative? It has NEVER worked well for me. Basic tasks sure, but I continually find myself pining for Word 5.1a on the Mac (System 7 that is) or Nisus Writer or even ClarisWorks.

If opening Microsoft Word documents fairly accurately is your only good feature, how do you expect to survive in the free (software) world?

So, while my twitter stream may suggest desires for punning the developers in the face or their early demise through painful methods….. I really just wish that sometime in the past 10 years you had made it not shit me to tears.

Certainly another failure of Sun Microsystems and I don’t expect Oracle to do any better at all (especially considering recent actions).

Burning Man shower test

Thursday, August 5th, 2010

image

Yazz has been making an epic over engineered shower for camp.

540W of LED rope for monorail track

Friday, July 30th, 2010

image

Testing how much power the rope will pull off the generators

ENUM now works properly (in Drizzle)

Tuesday, June 29th, 2010

Over at the Drizzle blog, the recent 2010-06-07 tarball was announced. This tarball release has my fixes for the ENUM type, so that it now works as it should. I was quite amazed that such a small block of code could have so many bugs! One of the most interesting was the documented limit we inherited from MySQL (see the MySQL Docs on ENUM) of a maximum of 65,535 elements for an ENUM column.

This all started out from a quite innocent comment of Jay‘s in a code review for adding support for the ENUM data type to the embedded_innodb engine. It was all pretty innocent… saying that I should use a constant instead of the magic 0×10000 number as a limit on an assert for sanity of values getting passed to the engine. Seeing as there wasn’t a constant already in the code for that (surprise number 1), I said I’d fix it properly in a separate patch (creating a bug for it so it wouldn’t get lost) and the code went in.

So, now, a few weeks after that, I got around to dealing with that bug (because hey, this was going to be an easy fix that’ll give me a nice sense of accomplishment). A quick look in the Field_enum code raised my suspicions of bugs… I initially wondered if we’d get any error message if a StorageEngine returned a table definition that had too many ENUM elements (for example, 70,000). So, I added a table to the tableprototester plugin (a simple dummy engine that is loaded for testing the parsing of specially constructed table messages) that had 70,000 elements for a single ENUM column. It didn’t throw an error. Darn. It did, however, have an incredibly large result for SHOW CREATE TABLE.

Often with bugs like this I may try to see if the problem is something inherited from MySQL. I’ll often file a bug with MySQL as well if that’s the case. If I can, I’ll sometimes attach the associated patch from Drizzle that fixes the bug, sometimes with a patch directly for and tested on MySQL (if it’s not going to take me too long). If these patches are ever applied is a whole other thing – and sometimes you get things like “each engine is meant to have auto_increment behave differently!” – which doesn’t inspire confidence.

But anyway, the MySQL limit is somewhere between 10850 and 10900. This is not at all what’s documented. I’ve filed the appropriate bug (Bug #54194) with reproducible test case and the bit of problematic code. It turns out that this is (yet another) limit of the FRM file. The limit is “about 64k FRM”. The bit of code in MySQL that was doing the checking for the ENUM limit was this:


/* Hack to avoid bugs with small static rows in MySQL */
  reclength=max(file->min_record_length(table_options),reclength);
  if (info_length+(ulong) create_fields.elements*FCOMP+288+
      n_length+int_length+com_length > 65535L || int_count > 255)
  {
    my_message(ER_TOO_MANY_FIELDS, ER(ER_TOO_MANY_FIELDS), MYF(0));
    DBUG_RETURN(1);
  }

So it’s no surprise to anyone how this specific limit (the number of elements in an ENUM) got missed when I converted Drizzle from using an FRM over to a protobuf based structure.

So a bunch of other cleanup later, a whole lot of extra testing and I can pretty confidently state that the ENUM type in Drizzle does work exactly how you think it would.

Either way, if you’re getting anywhere near 10,000 choices for an ENUM column you have no doubt already lost.

A warning to Solaris users…. (fsync possibly doesn’t)

Thursday, May 27th, 2010

Read the following:

Linux has its fair share of dumb things with data too (ext3 not defaulting to using write barriers is a good one). This is however particularly nasty… I’d have really hoped there were some good tests in place for this.

This should also be a good warning to anybody implementing advanced storage systems: we database guys really do want to be able to write things reliably and you really need to make sure this works.

So, Stewart’s current list of stupid shit you have to do to ensure a 1MB disk write goes to disk in a portable way:

  • You’re a database, so you’re using O_DIRECT
  • Use < 32k disk writes
  • fsync()
  • write 32-64mb of sequential data to hopefully force everything out of the drive write cache and onto the platter to survive power failure (because barriers may not be on). Increase this based on whatever caching system happens to be in place. If you think there may be battery backed RAID… maybe 1GB or 2GB of data writes
  • If you’re extending the file, don’t bother… that especially seems to be buggy. Create a new file instead.

Of course you could just assume that the OS kind of gets it right…. *laugh*

nocache LD_PRELOAD

Tuesday, May 25th, 2010

Want to do something like “cp big_file copy_of_big_file” or “tar xfz big_tarball.tar.gz” but without thrashing your cache?

Enrico Zini has a nice little LD_PRELOAD called nocache.

$ nocache tar xfz foo.tar.gz

Goes well with libeatmydata. A pair of tools for compensating for your Operating System casually hating you.

I imagine people will love this when taking database backups.

Exporting a set of bzr revisions as a quilt series

Thursday, May 20th, 2010

There has to be a better way than this… but it does work (at least for revisions 11 through 141):

for rev in `seq 11 141`;
do
if [ -z "`bzr diff -r\`expr $rev - 1\`..$rev|diffstat -p0 -l|grep ^tests`" ];
then
(bzr log -r$rev --forward --log-format=long
| sed -e 's/^  //;
/^------------------------------------------------------------/d;
/^revno:.*$/d; /^committer:.*/d; /^branch nick:/d;
/^timestamp: /d; /^message:/d';
echo;
echo;
bzr diff -r`expr $rev - 1`..$rev --prefix a/storage/innodb_plugin/:b/storage/innodb_plugin/) > patches/$rev.patch ;
echo $rev.patch >> patches/series;
fi;
done

Developing my own film

Monday, May 17th, 2010

dedicated bench, originally uploaded by macplusg3.

This is from the first film I’ve ever developed myself. I know a lot of people who’ve done this in school or something, but I never did.. so it’s just me, teaching myself (and playing with chemicals).

This was shot one day when I went out riding down in Black Rock (not too far from home). There’s something about benches dedicated to people that just twinges something in my brain… How do you get to the point where you think a great way to remember someone is to have a plaque on a bench? Carrying a camera while bike riding is quite useful sometimes.

Shot on Lucky B&W SHD100 film on at early 1970s Canon rangefinder.

desktop-couch has been nothing but suck

Saturday, May 8th, 2010
$ du -sh /home/stewart/.cache/desktop-couch/desktop-couchdb.*
746M	/home/stewart/.cache/desktop-couch/desktop-couchdb.log
4.0K	/home/stewart/.cache/desktop-couch/desktop-couchdb.pid
16K	/home/stewart/.cache/desktop-couch/desktop-couchdb.stderr
653M	/home/stewart/.cache/desktop-couch/desktop-couchdb.stdout

$ du -sh /home/stewart/.local/share/desktop-couch/.gwibber_messages_design/2f3267703246f5e02533e59714915b7d.view
436M	/home/stewart/.local/share/desktop-couch/.gwibber_messages_design/2f3267703246f5e02533e59714915b7d.view

I feel better already. I think the log files irritate me the most.

Drizzle Developer Day is TODAY!

Saturday, April 17th, 2010

http://drizzle.org/wiki/Drizzle_Developer_Day_2010

Upstairs in the Hyatt right near the Speaker room (down the hallway on the left from the main conference registration desk).

See you here!

The Drizzle (and MySQL) Key tuple format

Friday, April 2nd, 2010

Here’s something that’s not really documented anywhere (unless you count ha_innodb.cc as a source of server documentation). You may have some idea about the MySQL/Drizzle row buffer format. This is passed around the storage engine interface: in for write_row and update_row and out for the various scan and index read methods.

If you want to see the docs for it that exist in the code, check out store_key_val_for_row in ha_innodb.cc.

However, there is another format that is passed to your engine (and that your engine is expected to understand) and for lack of a better name, I’m going to call it the key tuple format. The first place you’ll probably see this is when implementing the index_read function for a Cursor (or handler in MySQL speak).

You get two things: a pointer to the buffer and the length of the buffer. Since a key can be made up of multiple parts, some of which can be NULL and some of which can be of variable length, this buffer is not (usually) a simple value. If you are starting out in your engine development, you can use this buffer blindly as a single value for non-nullable indexes with only 1 column.

The basic format is this:

  • The buffer is in-order of the index. First column in the index is first in the buffer, second second etc.
  • The buffer must be zero-filled. The server kernel will use memcmp to compare two key values.
  • If the column is NULLable, then the first byte is set to 1 if the column is null. Else, 0 means not-null.
  • From ha_innodb.cc (for BLOBs, which I haven’t put in embedded_innodb yet): If the column is of a BLOB type (it must be a column prefix field in this case), then we put the length of the data in the field to the next 2 bytes, in the little-endian format. If the field is SQL NULL, then these 2 bytes are set to 0. Note that the length of data in the field is <= column prefix length.
  • For fixed length fields (such as int), the next max field length bytes are for that field.
  • For VARCHAR, there is always a 2 byte (in little endian) length. This is different to the row format, which may have 1 or 2 bytes. In the key tuple format it is ALWAYS two bytes.

I’ll discuss the use of this for rnd_pos() and position() in a later post…

This blog post (but not the whole blog) is published under the Creative Commons Attribution-Share Alike License. Attribution is by linking back to this post and mentioning my name (Stewart Smith).

libeatmydata for Solaris

Monday, March 22nd, 2010

Thanks to Olly Betts, libeatmydata now has Solaris support as of release-15. So for those of you living on Solaris and actually doing a real fsync() during your test runs… do not fret! Feedback much appreciated (even better in patch form).

Writing A Storage Engine for Drizzle, Part 2: CREATE TABLE

Friday, March 12th, 2010

The DDL code paths for Drizzle are increasingly different from MySQL. For example, the embedded_innodb StorageEngine CREATE TABLE code path is completely different than what it would have to be for MySQL. This is because of a number of reasons, the primary one being that Drizzle uses a protobuf message to describe the table format instead of several data structures and a FRM file.

We are pretty close to having the table protobuf message format being final (there’s a few bits left to clean up, but expect them done Real Soon Now (TM)). You can see the definition (which is pretty simple to follow) in drizzled/message/table.proto. Also check out my series of blog posts on the table message (more posts coming, I promise!).

Drizzle allows either your StorageEngine or the Drizzle kernel to take care of storage of table metadata. You tell the Drizzle kernel that your engine will take care of metadata itself by specifying HTON_HAS_DATA_DICTIONARY to the StorageEngine constructor. If you don’t specify HTON_HAS_DATA_DICTIONARY, the Drizzle kernel stores the serialized Table protobuf message in a “table_name.dfe” file in a directory named after the database. If you have specified that you have a data dictionary, you’ll also have to implement some other methods in your StorageEngine. We’ll cover these in a later post.

If you ever dealt with creating a table in MySQL, you may recognize this method:

virtual int create(const char *name, TABLE *form, HA_CREATE_INFO *info)=0;

This is not how we do things in Drizzle. We now have this function in StorageEngine that you have to implement:

int doCreateTable(Session* session, const char *path,
                  Table& table_obj,
                  drizzled::message::Table& table_message)

The existence of the Table parameter is largely historic and at some point will go away. In the Embedded InnoDB engine, we don’t use the Table parameter at all. Shortly we’ll also get rid of the path parameter, instead having the table schema in the Table message and helper functions to construct path names.

Methods name “doFoo” (such as doCreateTable) mean that there is a method named foo() (such as createTable()) in the base class. It does some base work (such as making sure the table_message is filled out and handling any errors) while the “real” work is done by your StorageEngine in the doCreateTable() method.

The Embedded InnoDB engine goes through the table message and constructs a data structure for the Embedded InnoDB library to create a table. The ARCHIVE storage engine is much simpler, and it pretty much just creates the header of the ARZ file, mostly ignoring the format of the table. The best bet is to look at the code from one of these engines, depending on what type of engine you’re working on. This code, along with the table message definition should be more than enough.

This blog post (but not the whole blog) is published under the Creative Commons Attribution-Share Alike License. Attribution is by linking back to this post and mentioning my name (Stewart Smith).

Bike Riding in the storm

Saturday, March 6th, 2010

Out on a pier down St Kilda, the weather looked… well… like it could be a bit annoying on the way back:

but then… just a bit down the way…. it hit:

It was “a bit wet”. Big blocks of ice falling from the sky (that hurt).

Anyway, on the way back we found a storm water drain:

Yes, behind Michael is just all water (and I’m not talking about the Bay).

Still managed to get a 36.5km ride out of it, so not all bad.

Writing A Storage Engine for Drizzle, Part 1: Plugin basics

Monday, March 1st, 2010

So, you’ve decided to write a Storage Engine for Drizzle. This is excellent news! The API is continually being improved and if you’ve worked on a Storage Engine for MySQL, you’ll notice quite a few differences in some areas.

The first step is to create a skeleton StorageEngine plugin.

You can see my skeleton embedded_innodb StorageEngine plugin in its merge request.

The important steps are:

1. Create the plugin directory

e.g. mkdir plugin/embedded_innodb

2. Create the plugin.ini file describing the plugin

create the plugin.ini file in the plugin directory (so it’s plugin/plugin_name/plugin.ini)
An example plugin.ini for embedded_innodb is.

[plugin]
title=InnoDB Storage Engine using the Embedded InnoDB library
description=Work in progress engine using libinnodb instead of including it in tree.
sources=embedded_innodb_engine.cc
headers=embedded_innodb_engine.h

This gives us a title and description, along with telling the build system what sources to compile and what headers to make sure to include in any source distribution.

3. Add plugin dependencies

Your plugin may require extra libraries on the system. For example, the embedded_innodb plugin uses the Embedded InnoDB library (libinnodb).

Other examples include the MD5 function requiring either openssl or gnutls, the gearman related plugins requiring gearman libraries, the UUID() function requiring libuuid and BlitzDB requiring Tokyo Cabinet libraries.

For embedded_innodb, pandora-build has a macro for finding libinnodb on the system. We want to run this configure check, so we create a plugin.ac file in the plugin directory (i.e. plugin/plugin_name/plugin.ac) and add the check to it.

For embedded_innodb, the plugin.ac file just contains this one line:

PANDORA_HAVE_LIBINNODB

We also want to add two things to plugin.ini; one to tell the build system only to build our plugin if libinnodb was found and the other to link our plugin with libinnodb. For embedded_innodb, it’s these two lines:

build_conditional="x${ac_cv_libinnodb}" = "xyes"
ldflags=${LTLIBINNODB}
Not too hard at all! This should look relatively familiar for those who have seen autoconf and automake in the past.

Some plugins (such as the md5 function) have a bit more custom auto-foo in plugin.ini and plugin.ac (as one of two libraries can be used). You can do pretty much anything with the plugin system, but you’re a lot more likely to keep it simple like we have here.

4. Add skeleton source code for your StorageEngine

While this will change a little bit over time (and is a little long to just paste into here), you can see what I did for embedded_innodb in the skeleton-embedded-innodb-engine tree.

5. Build!

You will need to re-run ./config/autorun.sh so the build system picks up your new plugin. When you run ./configure --help afterwards, you should see options for building with/without your new plugin.

6. Add a test

You will probably want to add a test to see that your plugin loads successfully. When your plugin is built, the test suite automatically picks up any tests you have in the plugin/plugin_name/tests directory. This is in the same format as general MySQL and Drizzle tests: tests go in a t/ directory, expected results in a r/ directory.

Since we are loading a plugin, we will also need some server options to make sure that plugin is loaded. These are stored in the rather inappropriately named test-master.opt file (that’s the test name with “-master.opt” appended to the end instead of “.test“). For the embedded_innodb plugin_load test, we have a plugin/embedded_innodb/tests/t/plugin_load-master.opt file with the following content:

--plugin_add=embedded_innodb

You can have pretty much anything in the plugin_load.test file… if you’re fancy, you’ll have a SELECT query on data_dictionary.plugins to check that the plugin really is there. Be sure to also add a r/plugin_load.result file (My preferred method is to just create an empty result file, run the test suite and examine the rejected output before renaming the .reject file to .result)

Once you’ve added your test, you can run it either by just typing “make test” (which will run the whole test suite), or you can go into the main tests/ directory and run ./test-run.pl --suite=plugin_name (which will just run the tests for your plugin).

7. Check the code in, feel good about self

and you’re done. Well… the start of a Storage Engine plugin is done :)

This blog post (but not the whole blog) is published under the Creative Commons Attribution-Share Alike License. Attribution is by linking back to this post and mentioning my name (Stewart Smith).

Playing with multiple exposure

Sunday, February 28th, 2010

So, I discovered that my D200 had a built in “multiple exposure” option. While you can do exactly the same thing in GIMP (or Photoshop I guess) a whole lot easier (for one, you get to see what’s gonig on), we had been discussing Holga earlier in the night… so I felt it kind of appropriate to not really see what I was doing.

Leah playing guitar hero, me sitting across the room only slightly distracting her with a camera.

Guitar Hero

Maybe I will end up getting a Holga one of these days… being restricted can be fun.

anti-anti-feature: Windows license stickers

Tuesday, February 23rd, 2010

Anti-Anti-Feature: An antifeature that doesn’t actually do what it’s meant to (something you didn’t want in the first place)

My laptop came with a Windows Vista license. An anti-feature in itself – I didn’t want it, have never used it (I run Ubuntu and love freedom).

However, if you try and read the license key off this sticker, it’s increasingly difficult to do so. It’s being worn away. Why? Because it’s on the bottom of the laptop and I’m using it on my lap (so friction rubs it away).

Luckily I don’t run Windows Vista and need to re-install it any time soon.

on presenting

Tuesday, February 23rd, 2010

Dilbert.com

This is totally not confined to at-work presentations.

The number of sessions I have sat through that could have taken 5 minutes instead of 20,30,40 or even 60 is amazing. Remember: I have not flown half way around the globe to see you read. I have come to hear a story, to see how conclusions were formed and interact.

Often, the tools are deficient. Powerpoint encourages bad habits (you can use PowerPoint for excellent slide decks too, but ignore the temptations of boring templates, bad effects and dot lists). The dot point list is more often than not your enemy. I (and anybody else in the audience who has learnt to read) can read your dot points faster than you can. While I’m reading, I’m not listening to you. If you spoke a cure for all forms of cancer just after having put a slide up filled with dot points… 90% of people will miss it.

Now, dot points are an excellent way to remind you what the heck you’re meant to be talking about (and in what order). Use presenter notes! They are really useful.

If your laptop/presentation software doesn’t support a “presenter” mode that lets you view presenter notes but not the whole room, simply write them down, print them out, or anything like that. One simple practice run through will make you be able to do this seamlessly.

The last couple of presentations I did were completely assembled using 280slides.com. An excellent web app for doing presentations. It will import and export ODF (and other formats) so you’re not tied to a (unfortunately) non open source web app. That being said, it ran fine in my browser and unlike OpenOffice.org, did not make me want to stab people repeatedly every time I used it.

So, Stewart’s quick tips:

  • Tell a story. How did you get to your conclusions?
  • Don’t just read. Use visuals to accompany the talk. Visuals aren’t the talk.
  • Practice. Just once or twice through will make things a lot smoother.

Equipment:

  • Make sure your equipment works beforehand. Nobody wants to see you fiddle around with your Windows/OSX laptop only to find out you didn’t bring the dongle or can’t operate the Displays control panel. (Interestingly enough, I see Linux “just work” more than Windows or OSX these days).
  • If there is a microphone, use it. I don’t want to struggle to hear you.
  • If you are constantly using a laser pointer you either have too much on your slides or the slide does not highlight the important information. (laser pointers are useful when people ask questions though)

One blog I love on the subject is Presentation Zen. I’ll also recommend the book, but you can get so much just from the web site.

Some excellent recent presentations:

  • Simplicity Through OptimizationPaul McKenney
    Paul is able to explain RCU clearly and concisely through visuals. You are left with no doubt that this does really work. The visuals are not everything, they assist in the telling of the RCU story
  • Teach every child about foodJamie Oliver
    I watched this online. Note how not everything was smooth the whole way. Also note how this was still effective. Passion is an awesome tool. Check out the simple graph showing lead causes of death: simple and effective.
  • Bill Gates on energy: Innovating to Zero!
    Historically, Bill Gates has not been the most engaging speaker. We can all forget the horrible PowerPoint slides with four hundred dot points about some release of something that nobody cared about. This is different. Clear, concise, engaging and simple visuals to make the point.

First roll through the Nikon F80

Monday, February 22nd, 2010

A little while ago I bit the bullet and bought a Nikon film body – a F80. May as well have a film body that’s a bit automatic and takes the same lens mount as my digital.

So, I got it and thought “hrrm… I better run a roll of film through it to make sure it works”. Off to the fridge i went to find the cheapest, shittiest roll of film possible… I found “Walgreens” brand film. Manufactured by one of many, bought for cheap, and run through the F80.

Some shots turned out pretty good. I have the full set (most of the roll) up on flickr. A few choice ones are:

Which due to some nice accident of lighting, turned out pretty good. IIRC this was pretty late at night and I was editing photos as Michael came over (bringing much needed beer).

Slides and beer, do you need anything else? I just like this because it’s a snapshot of what I was working on (well, kinda, I was mostly just manipulating digital images).

Leah and I went bushwalking… so had to snap a shot of her. I do like the Nikon 50mm as a portrait lens. The film… well… it was cheap, but not too bad actually.

A shallow depth of field can be a lot of fun. Although not entirely sure how I feel about the bokeh….

Which has some odd colours. Nice, but odd.

I like my “new” body. It’ll be fun.

Anti-features mean pirates get all the good things

Monday, February 22nd, 2010

Spotted on Boing Boing:

The worst is renting… the amount of times you have to press skip (or the damn disc doesn’t work) you do start to wonder if you would have had a much better user experience if you just downloaded it instead.