words cannot describe the pain

that is the general experince of debian/ubuntu, raid and LVM setups for / and /boot and getting a working bootloader out of the installer…. grr… how come this NEVER works….

update: well, i now have a booting system… I even applied the updates, which fixed a few oddities with “Desktop Effects” (it’s called BLING dagnammit… i want to control my “Desktop Bling”… who on earth wants “desktop effects” when there’s the option of bling?). Unfortunately, the current kernel doesn’t boot at all… gets an oops suggesting running with irqpoll, which doesn’t help either. So back to the older kernel it is… and time to file a bug report. I think there’s a bug somewhere in the partitioner, LVM and RAID setup that gets mightily confused at some point that really ended up in a bad place. Now though, going through *very* carefully, I have grub booting off a RAID1 ext3 /boot no problems… even all my other file systems have come up okay… urgh.

MySQL Conf coming up (and memories of last year)

Andy Dustman just blogged referencing his previous posts on last years MySQL User Conference. This years is coming close (April 23-26) and the pressure to have all my presentations all perfect is mounting (err.. by the way, they will be).

Last year was a blast. Long days (and into the evenings) with sessions, BoFs, food and beer discussing all sorts of things that in some way related back to databases (and rather often, surprisingly enough, MySQL).

What was also great was being able to talk to lots of people who are doing real things out in the real world abotu MySQL Cluster and if it’s remotely suitable to their application. Often the answer can be “I think you’re looking for replication”, which is perfectly okay too.

I’m in a few days early (and around a few days after) – so if you’re around the area do give me a yell – it’d be cool to hang.

FYI, I’m giving the following sessions:

  • MySQL Cluster: The Complete Tutorial (Parts I and II)
    Which is a total of 6hrs of MySQL Cluster goodiness. It’s aimed at people who know MySQL (or are pretty good with other RDBMSs and can fake it) and are wanting to know about MySQL Cluster. It’s a hands-on tutorial, so be prepared!
  • Introduction to MySQL Cluster
    A 45minute whirlwind introduction to MySQL Cluster. Assumes some MySQL knowledge. Good if you’ve heard about this cluster thing (even from just reading the title of this session) and want to know what it’s all about.
  • Exploring New Features in MySQL 5.1 Cluster
    A 45 minute blast of a session on what’s new for MySQL Cluster in the 5.1 release. This will cover just about everything that was in my last years presentation on the same topic. So if you came to last years and come to this one again… I’m going to make fun of you for being a groupie :)
  • Bleeding Edge MySQL Cluster: Upcoming Cool Things
    A whole hour on the stuff you shouldn’t use in production. The topic list is sort-of known… it really is what is the latest and greatest that should be coming to a tree somewhere, sometime… this year. We’ll no doubt talk about online add node, online add/drop attribute, multithreaded NDB kernel, API improvements and a whole lot more!
  • The Design and Internals of MySQL Cluster
    What happens under the hood in MySQL Cluster? Find out here! An hour for those with the real technical mind. If source code and network protocol discriptions scare you, possibly not for you – expect an hour of coolness.

Yes, there seems to be a “Stewart” track at the conf :) Aparrently people enjoyed my session last year… so there was a tendancy to accept my sessions this year.

Patching your mission-critical email syncing software on your life setup… my OfflineIMAP patch for today

I’ve used OfflineIMAP for quite a while now. On the whole I’m fairly happy with it. Today I sent this to the list:

Forgive the potentially bad python, not my native tongue :)

This patch is motivated by three things:
- offlineimap is extremely slow at syncing lots of locally deleted
messages
- offlineimap uses lots of memory
- LocalStatus files aren't written safely (a hard crash can cause
corruption)
        - I've been bitten by this in the past, causing a complete resync of
the folder... so I get duplicate messages.

I am currently using 4.0.14 (from Debian) with this patch. I used it to
convert the files and everything. Seems quite reliable and quick.

In my tests, execution time for a normal sync is relatively the same.

Execution time for when lots of messages have been deleted in a
reasonably sized folder (e.g. during re-organisation of mail folders) is
as much as 10x faster.

In my tests, running with 1 thread uses as much as 20% less memory with
this patch (i.e. about 160MB instead of 200MB+ for my maildir)

Disk space used by the LocalStatus files isn't much more... for me it
looks like it's 6.5MB now versus 4.5MB then. We get the added benefit of
indexes for all our queries... nice :)

I had disable the threading for copying messages as this means that
LocalStatus objects are shared between threads, which pysqlite doesn't
like (it asserts).

I think the part of this patch that implements the uidexists does
actually slow things down compared with having the messagelist.... a
more optimal implementation may be possible, but I think the other speed
improvements (and memory savings) are worth it.

A future patch may convert other storage types to sqlite (or similar) to
further reduce memory consumption (and hopefully runtime).

This does add a dependency on pysqlite... which is packaged in debian
(and ubuntu) - and i'm using the stock packages for these.

Comments very much appreciated.
Of course, the patch is here. I’m using it now… although I’ll warn you that it does update your .offlineimap to a new format (and doesn’t provide you a way to go back, without restoring the backed-up LocalStatus files and probably getting message duplicates).

So, those around the MySQL circles I tend to hang around may ask “Why not libmysqld?” (the embedded MySQL server). Well… a few reasons… sqlite is file-per-db (even though I’m essentially using file-per-table here), the python bindings are everywhere (and work), it’s tiny and crash safe.

You may also ask “Why?”… well, I’ve been re-organising a bunch of mail folders, which means deleting a *lot* of messages from some folders (and moving them to others).. offlineimap has been really slow at this. So I fixed it, with code (not whining).

I also wrote a bit-of-a-hack perl script to remove duplicate messages from a bunch of folders (a bug in offlineimap had caused me to get several copies of each message in a bunch of my folders a while ago). So that script is here. Commented out are bits to do comparison via md5 as well as message-id. Don’t use unless you know what you’re doing… it may also use a few hundred MB RAM on large (few hundred thousand messages) folder.

Hopefully these will help improve my productivity.
Now, back to my regular programming….

JBOD can bite you… (and Ubuntu 7.04)

Okay, so one of the disks in a JBOD (well… single LVM) has been on the way out (hopefully can recover some stuff off it… there’s nothing completely important… but still).

I’ve now learnt and desktop has three new 320GB drives in a RAID5.

Currently installing Ubuntu 7.04 on it. I do have to say that the alternate install disk (which uses debian-installer) has a REALLY nice RAID and LVM setup now. If only it also let you pass parameters to mkfs it would be ideal.

Update: It got the bootloader horribly wrong though and I’ve gotten to piss-fart around trying to get LILO to install and boot. Current result? Blinking cursor in top left of screen. Fantastic… fucking fantastic.

faster net is da bomb!

So, while I was away it seems that Telstra enabled the 8Mbit down/1Mbit up stuff in their ADSL points in the exchanges (which has been possible for all sorts of amount of time).

I enabled/upgraded my Internode plan to get the faster speed. It got activated or whatever yesterday, but I didn’t really see an improvement. Anyway, headed over to the Internode support site to check out their setup instructions for my ADSL modem – turns out that simply by changing from PPPoE to PPPoA I’ve gotten a huge speed boost.

Just pulled an LCA video from the Internode mirror at 862K/s. Rock.

NDB Online Add Node Progress (or rather, testing it)

So, the sitch as of today:

Added ndb_mgm_set_configuration() call to the mgmapi – which is not-so-casually evil API call that sends a packed ndb_mgm_configuration object (like what you get from ndb_mgm_get_configuration) to the management server, who then resets its lists of nodes for event reporting and for ClusterMgr and starts serving things out of this configuration. Notably, if a data node restarts, it gets this new configuration.

By itself, this will let us write test programs for online configuration changes (e.g. changing DataMemory).

I’ve also added a Disabled property to data nodes. If set, just about everywhere ignores the node.

This allows a test program to test add/drop node functionality – without the need for external clusterware stopping and starting processes.

If you start with a large cluster, we can get a test program to disable some nodes and do an initial cluster restart (essentially starting a new, smaller cluster) and then add in the disabled nodes to form a larger cluster. Due to the way we do things, we actually still have the Transporters to the nodes we’re adding, which is slightly different than what happens in the real world. HOWEVER, it keeps the test program independent of any mechanism to start a node on a machine – so i don’t (for example) need to run ndb_cpcd on my laptop while testing.

But anyway, I now have a test program, that we could run as part of autotest that will happily take a 4 node cluster, start a 2 node cluster from it and online add 2 nodes.

Adding these new nodes into a nodegroup is currently not working with my patches though… for some reason the DBDICT transaction seems to not be going through the prepare phase… no doubt a bug in my code relating to something that’s changed in DBDICT in the past year.

So there is progress towards having the ability to add data nodes (and node groups) to a running cluster.

Online table re-organisation is another thing alltogether though… and no doubt some good subtle bugs to be written.

mgmapi timeouts going in…

So my timeout patches for the MySQL Cluster Management API have been finished. This should solve a lot of people’s problems writing management API  applications that want to do something sane when the management server either dies or gets somehow disconnected from you.

More importantly I should say, the autotest run looks good. It passed 199 tests in the daily-basic suite… which is a new record (I added some tests, so that could be classified as cheating)… probably would have been 200 if a sporadically failing test hadn’t failed :(

During my trip back to Melbourne, Jonas will probably apply these to a bunch of trees (at least some of the telco release) – with 5.1 coming at some point.

Bus Drivers

(as of the other day) Stockholm is now the only city where I’ve boarded a bus to see what I can only describe as a stunningly gorgeous bus driver.

The good news is she also passed the test for bus drivers – being able to drive. Seems to be a quality often found here. Back home, when regularly taking a bus (uni) it was inevitably a bit hit and miss.

I heart Gnome SSH Tunnel Manager

Jonas just switched me on to Gnome SSH Tunnel Manager – a simple GNOME app that stores a list of SSH tunnels you want and can automatically start and stop them.

Totally useful for those who travel (hrrm.. fair few MySQLers there) and/or always have SSH tunnels to places (hrrm… MySQLers there too).

There’s a debian package up there (and you can build one easily) but it’s not yet in the Ubuntu archive… maybe for the next release. But works fine on edgy for me!

irritation of the day….

There’s a lot of things about the MySQL bug tracking system i like… but there’s a few things that annoy the heck out of me.

Today it’s the fact that if you put a term in the “with any of the words” field on advanced search that’s an number (e.g. ‘839’ as you’re looking for bugs that talk about error 839) you get taken to bug numebr 839. Funnily enough, this has nothing to do with an NDB problem I’m trying to see the status of. grr…

and now, back to your regular programming…

Code size of an engine versus test suite

If you count the lines of code in the MySQL Cluster (NDB) test suite (mysql-5.1/storage/ndb/test – and exclude the old ODBC stuff) you come up with about 104000 lines of code. This is in contrast to the approximate other 350,000 lines of code for the NDB engine (excluding the handler, which is an additional 12,000 lines – this isn’t tested much by the NDB test suite… mysql-test-run.pl is meant to take care of a lot of that).

If you go and check the MyISAM tree, it’s only 40545 lines of code – for the entire engine. That’s right, the MySQL Cluster test suite is about 2.5 times the size of MyISAM.

If you look at mysql-test-run.pl tests, which are just lists of SQL commands with static data, it comes up at 250,000 lines (that excludes result files). The NDB tests do things programmatically – so can generate large amounts of data and different loads quite easily.

The architecture of the NDB tests (commonly referred to as autotest, ATRT or HUGO framework) is very different from mysql-test-run.pl – it easily allows you to write a test that is high on concurrency, high on load and high on amount of data. It also is modular, so that when you get an issue from a customer (or need to do some benchmarking on a speficic type of schema) you can use the utility programs to help you (e.g. there’s one that does random PK updates to tables, one that does scans, one that does index operations etc).

There’s this whole bunch of things you just cannot do with mysql-test-run.pl.

Then we get to fault injection… MySQL Cluster is a distributed system that is designed to withstand failure. Without testing this, we can never say it’s remotely HA. So we test it. A lot. We inject failures into nodes to check our node failure handling, using the utility programs and some basic shell it’s possible to do custom tests (such as multi-node failure)  where our test suite doesn’t have the best coverage yet.

Again, either not possible or extremely hard with mysql-test-run.pl

mysql_slap is the hint of a nice utility to help in testing… but using it in mysql-test-run.pl scripts in a verifyable way (i.e. check what came out is what went in, using a variety of access methods – full table scans, pk scans, index scans, stored procs, cursors, views, joins etc) is tricky at best (but really impossible).

Yes, I’m really pining for a better test suite infrastructure for the MySQL Server – it can only lead to better quality software…. almost somebody just rewriting a bunch of the hugo classes to use the MySQL C API would be useful.

Pleading for a better mail suite….

or really just all the Evolution bugs that I consistently hit to be fixed.

Why it needs hundreds of megabytes of memory just to list a single mail folder? What could it possibly be doing, loading the entire mailbox into memory? ick..

currently, after a crash “checking folder consistency” for at leat 10 minutes now… and aparrently i haev 13000 unread messages in INBOX. Bull. About 250 more likely. This will probably be some arse load of crack i’ll have to remove the cache files, restart evo 10 times and sacrifice  a goat to the gods of crackful annoying-apps.

mgmapi timeouts and resurrecting the online add node

The other day I managed to send off what’s nearly the final patches for adding proper timeout support to the MySQL Cluster management API. Jonas has had a bit of a look, found one thing I’ve missed, but it’ll probably get in somewhere soon (probably the carrier grade edition first, then others… 5.1 makes sense IMHO if only for the amount of management server testing that my patches add).

Unfortunately in what we laughingly call the past the management server – for whatever hysterical raisins – never really received much direct testing. Sure, if the data nodes couldn’t get configuration, autotest couldn’t control the daemons or something then things were obviously broken. But, say, a subtle (or not so much) change in API or behaviour would certainly not be picked up.

Although the real “feature of the year” (not my words) is fault injection for the management server that we can use in testing. The MySQL Cluster kernel (data nodes) already have extensive fault injection that is regularly tested by ATRT (storage/ndb/test in the source tree).

I’ve also started to resurrect my online add node patch that I’ve had sitting around in various states for over a year (actually… about 14 months… i just haven’t touched it in 12) and port it to the latest 5.1 tree (as not sure where it’ll end up, start at the lowest common denominator – possible that it’ll end up in Carrier Grade first too). Now comes the problem of testing the sucker. Previously i’ve had a shockingly bad shell script and hard coded files to make this go.

Obviously, hard coded stuff is not the way to go. The real way is to be able to do everything neatly and programmatically so we can run it as part of the regular autotest.

oh LugRadio how funny you are

How many other open source/free software radio/podcast shows could pull off discussing morals and free software with a discussion on machines running free software that a) used for efficient slaughter (of various things) and b) the violent anal raping of donkeys.

Apparently there’s people who think of these things…. and it’s hilarious.

(hrrm… what does that say about me finding that hilarious?)

hej hej

Great things:

  1. I get to see snow. I haven’t seen snow anywhere else in the world yet, just in Stockholm (apart from flying over places… but that doesn’t really count)
  2. The language is cool, a lot of people speak English (to varying degrees) and it’s not that hard to pick up enough to get by (especially since TV programs as subtitled… so watching Buffy on Swedish TV will educate you in enough Swedish to save the world from unspeakable demons)
  3. We have MySQLers here (including a good number of Cluster developers)
  4. They have the Internet here. Not like Australia, stuck on the arse end of the internet – oh no, 5Mbit is considered slow here.
  5. Stockholm really is a beutiful city.
  6. public transport is frequent and close by (at least for the Stockholm area… which is where I am). Further into the center it’s even better, but here it’s good (where here is about 15-20mins via bus and subway to Liljeholmen, where the office is)
  7. R&D is (again, unlike Australia) valued highly here, with a good amonut of high tech industry and a seeming respect for academia.
  8. There’s a chemist in Gamla Stan that’s been there for about 400 years. I haven’t bought anything from there, but I feel I should – to go with that beer from that pub that first got it’s license nearly 400 years ago that I had while in London.

And not so great…

  1. The only way to buy beer stronger than 3.5% is to go to the government run System Bolaget – which is closed at about any time you’d consider buying alcohol. Aparrently the locals get around this by going there and just buying heaps at once – so completely defeating the attempt to get people to buy less. Oh, and if you like any decent liquor – it’s probably cheaper to drive/fly to another country and bring it back. Aparrently that’s what people do… with vans. Lucky for me I picked up some Laphroigh on the way through London
  2. Some things are expensive… and there are relatively high tax rates… although you seem to actually get something for that, so it’s not all bad (unlike in .au… where you seem to get nothing).
  3. It’s a long way from Melbourne, especially in economy seats… urggh. Not exactly a company policy I agree with for such long trips.

for now, hej då

Stockholm

Currently in the MySQL Cluster team office in Stockholm – and have been since Wednesday. I’ll be here for the next 3 weeks working in the office. This will be the longest amount of time I’ve worked in an actual office (instead of working from home) in more than 2.25 years!

I found Veronica Mars on TV last night… which is great, because I’ve sort of become addicted. Unfortunately, Sweden is a few episodes ahead of Australia…. so I’ve skipped a few now (go MythTV, record them for me baby). One really good thing about Swedish TV is that things are subtitled instead of dubbed – excellent if your Swedish isn’t that great (mine isn’t). Whenever here, I also seem to find some TV shows that look really interesting, except for the fact that it’s all in a language I don’t understand… certainly an interesting dilemma.

Today I’ve been working on material for the MySQL Cluster: The Complete Tutorial at the MySQL Conference and Expo. The conference is April 23-26 in Santa Clara, California (and if you register by March 14 you save $200 on registration). It’s going to be a SIX hour hands on tutorial (with breaks, don’t worry).  The hands on part (I think) is very important… that way you walk away with real-world knowledge that you can directly apply – not just with theory that you could have gotten from reading a bit here and there.

I’m really hoping that as many people with existing knowledge (esp MySQLers) can be around during the session to help people when needed… I have a feeling there’ll be a few.