MySQL Conf: Getting Drunk with Eben Moglen

So Jay Pipes pointed out that Eben Moglen is speaking at the upcoming MySQL Conference in his attention grabbing post: Getting Drunk with Eben Moglen.

I saw Eben speak at linux.conf.au 2005 in Canberra – which was totally totally awesome.

I’m really looking forward to seeing him again – honestly, it’s probably worth the conference admission fee just to see this session.

Scary

So far, this year, past and planned (as in currently reserved) travel means I will spend the equivalent of at least 21 work days inside aircraft – just for MySQL related travel.

eep.

2.1MB/minute

I have opened a web browser and typed this entry in about half the time it takes OpenOffice.org (2.0.4) to save my presentation file (a whole 2.1MB). eep.

Q&A with MySQL Cluster content (my 2c thrown in)

Ivan mentions the Q and As from a Q&A session in which MySQL Cluster is mentioned – I thought I’d add my perspective here as well:

Q from Matthew: When are we likely to see disk based indexing for ndb?
Disk based indexing is planned in one of the future releases, but we can’t say when we will implement it. During the webinar, Anders pointed out that he does not see this as an important thing. I tend to agree with Anders, at least considering the current status of the storage engine. At the moment, ndb can perform an unbeatable job (in terms of HA and performance) on small transactions and simple queries and we should not consider it as a full replacement for the whole database, in general. The future versions of ndb will probably be more and more general purpose and at some point a full disk based ndb will be valuable. Please take this as my personal opinion.

Implementing disk based indexes is a fair bit of work… Certainly not this year (or early next). Sure, it’s a crucial step towards world domination… but it does have to sit in a priority queue of other steps.

Q from Malcolm: Is their any difference between MysQL Cluster and the telecoms version?
As Bertrand said, MySQL Cluster Carrier Grade is a specific version for telecom, developed closely with major equipment manufacturers. During the presentation I have highlighted some differences – such as the availability of more data nodes and so on. We will cover MySQL Cluster and MySQL Cluster Carrier Grade Edition in one of the future sessions.

It’ll be good to have a special session on the difference. The basic difference is that we’re a bit more selective about what patches go into the Carrier grade trees – and sometimes some features will go there first (when customers really need it). We will typically try to be less invasive in some areas too. Odds are though, if you’re not a telco, you don’t need it.

Q from Fabio: Any plan for MySQL Cluster for Windows?
We are considering it sometimes in the future, but no plans have been made so far.

Yes, this has been “being considered” for years. No, it’s not going to happen any time soon. Patches welcome.

Q from Owen: Is it difficult to define memory requirements for MySQL Cluster?
MySQL Cluster configuration is the most important step when you adopt this technology. We have seen several do-it-yourself configurations, running perfectly. But Cluster configuration is not straightforward and we always recommend to get some help from our Professional Services team.

Each time I patch ndb_size.pl it gets more accurate and is less outrageously wrong in some scenarios now :) It can help… although you also need to know what you’re measuring – and account for future growth.

Q from Alessandro: Is carrier grade avalaible for download?
As Bertrand said, please contact us at http://www.mysql.com/company/contact/ if you are interested in MySQL Cluster Carrier Grade for telecom customers

I beleive the plan is to publish the BK trees as well… but certainly not the supported way to run it.

There was also some talk on DRBD and shared disk clusters. Neither of these prevent against file system corruption. Also, if using a non-crash safe engine (e.g. MyISAM) when you fail over you’ll probably have to do a bunch of table checks – not exactly HA.

Record autotest numbers for NDB

So, with a bunch of recent tests I added (and some bugs that have been fixed) we’re now consistently getting 203 or 204 passing tests. We’ve got typically around 8 or 9 that often fail – often because the test may be broken or not quite deterministic. Or there’s a bug… :)

(all numbers for the daily-basic list of tests for various 5.1 branches).

It would be great to hit 300 by this time next year… which means a lot of test cases… hrrm… anybody want to volunteer?

Oatmeal Raisin Biscuits (or Cookies if you’re from the land of the rolled r)

So I’ve slightly adapted this recipie (Vegan / Vegetarian Recipes – Biscuits – Oat – Oatmeal Raisin Cookies). Using soy milk instead of orange juice (couldn’t be bothered going to shops as I drank all the stuff I bought last weekend) and just olive oil instead of corn oil.

Currently in the oven. The dough tasted good though.

Update: the do taste rather good!

words cannot describe the pain

that is the general experince of debian/ubuntu, raid and LVM setups for / and /boot and getting a working bootloader out of the installer…. grr… how come this NEVER works….

update: well, i now have a booting system… I even applied the updates, which fixed a few oddities with “Desktop Effects” (it’s called BLING dagnammit… i want to control my “Desktop Bling”… who on earth wants “desktop effects” when there’s the option of bling?). Unfortunately, the current kernel doesn’t boot at all… gets an oops suggesting running with irqpoll, which doesn’t help either. So back to the older kernel it is… and time to file a bug report. I think there’s a bug somewhere in the partitioner, LVM and RAID setup that gets mightily confused at some point that really ended up in a bad place. Now though, going through *very* carefully, I have grub booting off a RAID1 ext3 /boot no problems… even all my other file systems have come up okay… urgh.

MySQL Conf coming up (and memories of last year)

Andy Dustman just blogged referencing his previous posts on last years MySQL User Conference. This years is coming close (April 23-26) and the pressure to have all my presentations all perfect is mounting (err.. by the way, they will be).

Last year was a blast. Long days (and into the evenings) with sessions, BoFs, food and beer discussing all sorts of things that in some way related back to databases (and rather often, surprisingly enough, MySQL).

What was also great was being able to talk to lots of people who are doing real things out in the real world abotu MySQL Cluster and if it’s remotely suitable to their application. Often the answer can be “I think you’re looking for replication”, which is perfectly okay too.

I’m in a few days early (and around a few days after) – so if you’re around the area do give me a yell – it’d be cool to hang.

FYI, I’m giving the following sessions:

  • MySQL Cluster: The Complete Tutorial (Parts I and II)
    Which is a total of 6hrs of MySQL Cluster goodiness. It’s aimed at people who know MySQL (or are pretty good with other RDBMSs and can fake it) and are wanting to know about MySQL Cluster. It’s a hands-on tutorial, so be prepared!
  • Introduction to MySQL Cluster
    A 45minute whirlwind introduction to MySQL Cluster. Assumes some MySQL knowledge. Good if you’ve heard about this cluster thing (even from just reading the title of this session) and want to know what it’s all about.
  • Exploring New Features in MySQL 5.1 Cluster
    A 45 minute blast of a session on what’s new for MySQL Cluster in the 5.1 release. This will cover just about everything that was in my last years presentation on the same topic. So if you came to last years and come to this one again… I’m going to make fun of you for being a groupie :)
  • Bleeding Edge MySQL Cluster: Upcoming Cool Things
    A whole hour on the stuff you shouldn’t use in production. The topic list is sort-of known… it really is what is the latest and greatest that should be coming to a tree somewhere, sometime… this year. We’ll no doubt talk about online add node, online add/drop attribute, multithreaded NDB kernel, API improvements and a whole lot more!
  • The Design and Internals of MySQL Cluster
    What happens under the hood in MySQL Cluster? Find out here! An hour for those with the real technical mind. If source code and network protocol discriptions scare you, possibly not for you – expect an hour of coolness.

Yes, there seems to be a “Stewart” track at the conf :) Aparrently people enjoyed my session last year… so there was a tendancy to accept my sessions this year.

Patching your mission-critical email syncing software on your life setup… my OfflineIMAP patch for today

I’ve used OfflineIMAP for quite a while now. On the whole I’m fairly happy with it. Today I sent this to the list:

Forgive the potentially bad python, not my native tongue :)

This patch is motivated by three things:
- offlineimap is extremely slow at syncing lots of locally deleted
messages
- offlineimap uses lots of memory
- LocalStatus files aren't written safely (a hard crash can cause
corruption)
        - I've been bitten by this in the past, causing a complete resync of
the folder... so I get duplicate messages.

I am currently using 4.0.14 (from Debian) with this patch. I used it to
convert the files and everything. Seems quite reliable and quick.

In my tests, execution time for a normal sync is relatively the same.

Execution time for when lots of messages have been deleted in a
reasonably sized folder (e.g. during re-organisation of mail folders) is
as much as 10x faster.

In my tests, running with 1 thread uses as much as 20% less memory with
this patch (i.e. about 160MB instead of 200MB+ for my maildir)

Disk space used by the LocalStatus files isn't much more... for me it
looks like it's 6.5MB now versus 4.5MB then. We get the added benefit of
indexes for all our queries... nice :)

I had disable the threading for copying messages as this means that
LocalStatus objects are shared between threads, which pysqlite doesn't
like (it asserts).

I think the part of this patch that implements the uidexists does
actually slow things down compared with having the messagelist.... a
more optimal implementation may be possible, but I think the other speed
improvements (and memory savings) are worth it.

A future patch may convert other storage types to sqlite (or similar) to
further reduce memory consumption (and hopefully runtime).

This does add a dependency on pysqlite... which is packaged in debian
(and ubuntu) - and i'm using the stock packages for these.

Comments very much appreciated.
Of course, the patch is here. I’m using it now… although I’ll warn you that it does update your .offlineimap to a new format (and doesn’t provide you a way to go back, without restoring the backed-up LocalStatus files and probably getting message duplicates).

So, those around the MySQL circles I tend to hang around may ask “Why not libmysqld?” (the embedded MySQL server). Well… a few reasons… sqlite is file-per-db (even though I’m essentially using file-per-table here), the python bindings are everywhere (and work), it’s tiny and crash safe.

You may also ask “Why?”… well, I’ve been re-organising a bunch of mail folders, which means deleting a *lot* of messages from some folders (and moving them to others).. offlineimap has been really slow at this. So I fixed it, with code (not whining).

I also wrote a bit-of-a-hack perl script to remove duplicate messages from a bunch of folders (a bug in offlineimap had caused me to get several copies of each message in a bunch of my folders a while ago). So that script is here. Commented out are bits to do comparison via md5 as well as message-id. Don’t use unless you know what you’re doing… it may also use a few hundred MB RAM on large (few hundred thousand messages) folder.

Hopefully these will help improve my productivity.
Now, back to my regular programming….

JBOD can bite you… (and Ubuntu 7.04)

Okay, so one of the disks in a JBOD (well… single LVM) has been on the way out (hopefully can recover some stuff off it… there’s nothing completely important… but still).

I’ve now learnt and desktop has three new 320GB drives in a RAID5.

Currently installing Ubuntu 7.04 on it. I do have to say that the alternate install disk (which uses debian-installer) has a REALLY nice RAID and LVM setup now. If only it also let you pass parameters to mkfs it would be ideal.

Update: It got the bootloader horribly wrong though and I’ve gotten to piss-fart around trying to get LILO to install and boot. Current result? Blinking cursor in top left of screen. Fantastic… fucking fantastic.

organisation (and news)

Today it was pointed out to me that news of me not being satisfied with my level of organisation for a trip wasn’t news – the fact of me being organised would have been though.

faster net is da bomb!

So, while I was away it seems that Telstra enabled the 8Mbit down/1Mbit up stuff in their ADSL points in the exchanges (which has been possible for all sorts of amount of time).

I enabled/upgraded my Internode plan to get the faster speed. It got activated or whatever yesterday, but I didn’t really see an improvement. Anyway, headed over to the Internode support site to check out their setup instructions for my ADSL modem – turns out that simply by changing from PPPoE to PPPoA I’ve gotten a huge speed boost.

Just pulled an LCA video from the Internode mirror at 862K/s. Rock.

NDB Online Add Node Progress (or rather, testing it)

So, the sitch as of today:

Added ndb_mgm_set_configuration() call to the mgmapi – which is not-so-casually evil API call that sends a packed ndb_mgm_configuration object (like what you get from ndb_mgm_get_configuration) to the management server, who then resets its lists of nodes for event reporting and for ClusterMgr and starts serving things out of this configuration. Notably, if a data node restarts, it gets this new configuration.

By itself, this will let us write test programs for online configuration changes (e.g. changing DataMemory).

I’ve also added a Disabled property to data nodes. If set, just about everywhere ignores the node.

This allows a test program to test add/drop node functionality – without the need for external clusterware stopping and starting processes.

If you start with a large cluster, we can get a test program to disable some nodes and do an initial cluster restart (essentially starting a new, smaller cluster) and then add in the disabled nodes to form a larger cluster. Due to the way we do things, we actually still have the Transporters to the nodes we’re adding, which is slightly different than what happens in the real world. HOWEVER, it keeps the test program independent of any mechanism to start a node on a machine – so i don’t (for example) need to run ndb_cpcd on my laptop while testing.

But anyway, I now have a test program, that we could run as part of autotest that will happily take a 4 node cluster, start a 2 node cluster from it and online add 2 nodes.

Adding these new nodes into a nodegroup is currently not working with my patches though… for some reason the DBDICT transaction seems to not be going through the prepare phase… no doubt a bug in my code relating to something that’s changed in DBDICT in the past year.

So there is progress towards having the ability to add data nodes (and node groups) to a running cluster.

Online table re-organisation is another thing alltogether though… and no doubt some good subtle bugs to be written.

mgmapi timeouts going in…

So my timeout patches for the MySQL Cluster Management API have been finished. This should solve a lot of people’s problems writing management API  applications that want to do something sane when the management server either dies or gets somehow disconnected from you.

More importantly I should say, the autotest run looks good. It passed 199 tests in the daily-basic suite… which is a new record (I added some tests, so that could be classified as cheating)… probably would have been 200 if a sporadically failing test hadn’t failed :(

During my trip back to Melbourne, Jonas will probably apply these to a bunch of trees (at least some of the telco release) – with 5.1 coming at some point.

Bus Drivers

(as of the other day) Stockholm is now the only city where I’ve boarded a bus to see what I can only describe as a stunningly gorgeous bus driver.

The good news is she also passed the test for bus drivers – being able to drive. Seems to be a quality often found here. Back home, when regularly taking a bus (uni) it was inevitably a bit hit and miss.

I heart Gnome SSH Tunnel Manager

Jonas just switched me on to Gnome SSH Tunnel Manager – a simple GNOME app that stores a list of SSH tunnels you want and can automatically start and stop them.

Totally useful for those who travel (hrrm.. fair few MySQLers there) and/or always have SSH tunnels to places (hrrm… MySQLers there too).

There’s a debian package up there (and you can build one easily) but it’s not yet in the Ubuntu archive… maybe for the next release. But works fine on edgy for me!

irritation of the day….

There’s a lot of things about the MySQL bug tracking system i like… but there’s a few things that annoy the heck out of me.

Today it’s the fact that if you put a term in the “with any of the words” field on advanced search that’s an number (e.g. ‘839’ as you’re looking for bugs that talk about error 839) you get taken to bug numebr 839. Funnily enough, this has nothing to do with an NDB problem I’m trying to see the status of. grr…

and now, back to your regular programming…

Code size of an engine versus test suite

If you count the lines of code in the MySQL Cluster (NDB) test suite (mysql-5.1/storage/ndb/test – and exclude the old ODBC stuff) you come up with about 104000 lines of code. This is in contrast to the approximate other 350,000 lines of code for the NDB engine (excluding the handler, which is an additional 12,000 lines – this isn’t tested much by the NDB test suite… mysql-test-run.pl is meant to take care of a lot of that).

If you go and check the MyISAM tree, it’s only 40545 lines of code – for the entire engine. That’s right, the MySQL Cluster test suite is about 2.5 times the size of MyISAM.

If you look at mysql-test-run.pl tests, which are just lists of SQL commands with static data, it comes up at 250,000 lines (that excludes result files). The NDB tests do things programmatically – so can generate large amounts of data and different loads quite easily.

The architecture of the NDB tests (commonly referred to as autotest, ATRT or HUGO framework) is very different from mysql-test-run.pl – it easily allows you to write a test that is high on concurrency, high on load and high on amount of data. It also is modular, so that when you get an issue from a customer (or need to do some benchmarking on a speficic type of schema) you can use the utility programs to help you (e.g. there’s one that does random PK updates to tables, one that does scans, one that does index operations etc).

There’s this whole bunch of things you just cannot do with mysql-test-run.pl.

Then we get to fault injection… MySQL Cluster is a distributed system that is designed to withstand failure. Without testing this, we can never say it’s remotely HA. So we test it. A lot. We inject failures into nodes to check our node failure handling, using the utility programs and some basic shell it’s possible to do custom tests (such as multi-node failure)  where our test suite doesn’t have the best coverage yet.

Again, either not possible or extremely hard with mysql-test-run.pl

mysql_slap is the hint of a nice utility to help in testing… but using it in mysql-test-run.pl scripts in a verifyable way (i.e. check what came out is what went in, using a variety of access methods – full table scans, pk scans, index scans, stored procs, cursors, views, joins etc) is tricky at best (but really impossible).

Yes, I’m really pining for a better test suite infrastructure for the MySQL Server – it can only lead to better quality software…. almost somebody just rewriting a bunch of the hugo classes to use the MySQL C API would be useful.

Pleading for a better mail suite….

or really just all the Evolution bugs that I consistently hit to be fixed.

Why it needs hundreds of megabytes of memory just to list a single mail folder? What could it possibly be doing, loading the entire mailbox into memory? ick..

currently, after a crash “checking folder consistency” for at leat 10 minutes now… and aparrently i haev 13000 unread messages in INBOX. Bull. About 250 more likely. This will probably be some arse load of crack i’ll have to remove the cache files, restart evo 10 times and sacrifice  a goat to the gods of crackful annoying-apps.

mgmapi timeouts and resurrecting the online add node

The other day I managed to send off what’s nearly the final patches for adding proper timeout support to the MySQL Cluster management API. Jonas has had a bit of a look, found one thing I’ve missed, but it’ll probably get in somewhere soon (probably the carrier grade edition first, then others… 5.1 makes sense IMHO if only for the amount of management server testing that my patches add).

Unfortunately in what we laughingly call the past the management server – for whatever hysterical raisins – never really received much direct testing. Sure, if the data nodes couldn’t get configuration, autotest couldn’t control the daemons or something then things were obviously broken. But, say, a subtle (or not so much) change in API or behaviour would certainly not be picked up.

Although the real “feature of the year” (not my words) is fault injection for the management server that we can use in testing. The MySQL Cluster kernel (data nodes) already have extensive fault injection that is regularly tested by ATRT (storage/ndb/test in the source tree).

I’ve also started to resurrect my online add node patch that I’ve had sitting around in various states for over a year (actually… about 14 months… i just haven’t touched it in 12) and port it to the latest 5.1 tree (as not sure where it’ll end up, start at the lowest common denominator – possible that it’ll end up in Carrier Grade first too). Now comes the problem of testing the sucker. Previously i’ve had a shockingly bad shell script and hard coded files to make this go.

Obviously, hard coded stuff is not the way to go. The real way is to be able to do everything neatly and programmatically so we can run it as part of the regular autotest.