Adverts or prettiness?

I normally read Planet MySQL from RSS, but for some reason I ended up on the actual site today in a Web Browser (Epiphany to be exact) and saw this:

planetmysql-ugly.jpg

And thought, “wow, ugly”. I don’t keep my browser “maximised” because I think it’s a stupid way to work – I often switch between tasks or like to have an editor open while referring to something in a browser (e.g. some tech details of some source module), or monitor IRC or IM. I remembered that Epiphany has an Ad Blocking extension, so in an effort to de-uglify, I enabled it. I now see:

planetmysql-pretty.jpg

Hrrm… much better. Notice how the links on the left to the most active are actually useful now (I can read them).

Note that this isn’t a rant on adverts on web sites – I can handle them (the google ones which aren’t obtrusive) – I’m against the uglyweb.

pluggable NDB

Spoke with Brian the other day on what was required to get NDB to be a pluggable engine – and started hacking.

The tricky bits invole dependencies of things like mysqldump and ndb_restore on some headers to determine what tables shouldn’t be dumped (hint: the cluster database used for replication).

Also, all those command line parameters and global variables – they’re fun too. It turns out InnoDB and PBXT are also waiting on this. In the meantime, I’ve done a hack that puts config options in a table.

Currently blocked on getting the embedded server (libmysqld) to build properly – but i now have a sql/mysqld binary with pluggable NDB. All libtool foo too.

Hopefully i’ll be able to post soon with a “it works” post

CREATE, INSERT, SELECT, DROP benchmark

Inspired by PeterZ’s Opening Tables scalability post, I decided to try a little benchmark. This benchmark involved the following:

  • Create 50,000 tables
  • CREATE TABLE t{$i} (i int primary key)
  • Insert one row into each table
  • select * from each table
  • drop each table
  • I wanted to test file system impact on this benchmark. So, I created a new LVM volume, 10GB in size. I extracted a ‘make bin-dist’ of a recent MySQL 5.1 tree, did a “mysql-test-run.pl –start-and-exit” and ran my script, timing real time with time.

    For a default ext3 file system creating MyISAM tables, the test took 15min 8sec.

    For a default xfs file sytem creating MyISAM tables, the test took 7min 20sec.

    For an XFS file system with a 100MB Version 2 log creating MyISAM tables, the test took 7min 32sec – which is within repeatability of the default XFS file system. So log size and version made no real difference.

    For a default reiserfs (v3) file system creating MyISAM tables, the test took 9m 44sec.

    For a ext3 file system with the dir_index option enabled creating MyISAM tables, the test took 14min 21sec.

    For an approximate measure of the CREATE performance…. ext3 and reiserfs averaged about 100 tables/second (although after the 20,000 mark, reiserfs seemed to speed up a little). XFS  averaged about 333 tables/second. I credit this to the check for if the files exist being performed by a b-tree lookup in XFS once the directory reached a certain size.

    Interestingly, DROPPING the tables was amazingly fast on ext3 – about 2500/sec. XFS about 1000/sec. So ext3 can destroy easier than it can create while XFS keeps up to speed with itself.

    What about InnoDB tables? Well…

    ext3(default): 21m 11s

    xfs(default): 12m 48s

    ext3(dir_index): 21m 11s

    Interestingly the create rate for XFS was around 500 tables/second – half that of MyISAM tables.

    These are interesting results for those who use a lot of temporary tables or do lots of create/drop tables as part of daily life.

    All tests performed on a Western Digital 250GB 7200rpm drive in a 2.8Ghz 800Mhz FSB P4 with  2GB memory running Ubuntu 6.10 with HT enabled.

    At the end of the test, the ibdata1 file had grown to a little over 800MB – still enough to fit in memory. If we increased this to maybe 200,000 tables (presumably about a 3.2GB file) that wouldn’t fit in cache, then the extents of XFS would probably make it perform better when doing INSERT and SELECT queries as opposed to the list of blocks that ext3 uses. This is because the Linux kernel caches the mapping of in memory block to disk block lookup making the efficiency of this in the file system irrelevant for data sets less than memory size.

    So go tell your friends: XFS is still the coolest kid on the block.

    8.6GB of email

    If you tar my Maildir, it comes out at about 8.6GB currently. That’s about all my mail since October 2001. Notable exceptions are most of the Spam I’ve received and any messages from LKML.
    Doing a first time sync with offlineimap takes an amount of time that is truly scary. Over 8 hours. When connected directly to the IMAP server with 100base.

    With only 2.7GB synced so far (on one of my machines), there’s been about 210,000 messages transfered. There are about 500,000 messages to go. So a total time of around 24hrs. eep.
    Thunderbird did an offline download of the 700MB of my INBOX very quickly (i.e. at a speed that was encouraging and made me not stop it before it completed). Evolution seemed to be really slow, and want to put header and body in separate files (hrrm… that’s going to be a lot of files). However, I am not ready to switch mail clients to Thunderbird (a variety of reasons).

    I tried switching to dovecot over the weekend – didn’t turn out so great. The sync speed was even slower, and syncing from remote dovecot to local dovecot was horrifically slow (probably would have completed by the end of the week). key word, probably.

    I don’t consider my Maildir to be big. I don’t get many large attachments.

    The fact that Evolution uses about 250MB of memory when I start it up, and 300MB now after sending a few mails is sort of disturbing. Although this does seem to be down from previous versions – so horray Evolution team for that.

    I wonder why such a key piece of infrastructure seems to be so neglected.

    Disk allocation, XFS, NDB Disk Data and more…

    I’ve talked about disk space allocation previously, mainly revolving around XFS (namely because it’s what I use, a sensible choice for large file systems and large files and has a nice suite of tools for digging into what’s going on).Most people write software that just calls write(2) (or libc things like fwrite or fprintf) to do file IO – including space allocation. Probably 99% of file io is fine to do like this and the allocators for your file system get it mostly right (some more right than others). Remember, disk seeks are really really expensive so the less you have to do, the better (i.e. fragmentation==bad).

    I recently (finally) wrote my patch to use the xfsctl to get better allocation for NDB disk data files (datafiles and undofiles).
    patch at:
    http://lists.mysql.com/commits/15088

    This actually ends up giving us a rather nice speed boost in some of the test suite runs.

    The problem is:
    – two cluster nodes on 1 host (in the case of the mysql-test-run script)
    – each node has a complete copy of the database
    – ALTER TABLESPACE ADD DATAFILE / ALTER LOGFILEGROUP ADD UNDOFILE creates files on *both* nodes. We want to zero these out.
    – files are opened with O_SYNC (IIRC)

    The patch I committed uses XFS_IOC_RESVSP64 to allocate (unwritten) extents and then posix_fallocate to zero out the file (the glibc implementation of this call just writes zeros out).

    Now, ideally it would be beneficial (and probably faster) to have XFS do this in kernel. Asynchronously would be pretty cool too.. but hey :)

    The reason we don’t want unwritten extents is that NDB has some realtime properties, and futzing about with extents and the like in the FS during transactions isn’t such a good idea.

    So, this would lead me to try XFS_IOC_ALLOCSP64 – which doesn’t have the “unwritten extents” warning that RESVSP64 does. However, with the two processes writing the files out, I get heavy fragmentation. Even with a RESVSP followed by ALLOCSP I get the same result.

    So it seems that ALLOCSP re-allocates extents (even if it doesn’t have to) and really doesn’t give you much (didn’t do too much timing to see if it was any quicker).

    I’ve asked if this is expected behaviour on the XFS list… we’ll see what the response is (i haven’t had time yet to go read the code… i should though).

    So what improvement does this patch make? well, i’ll quote my commit comments:

    BUG#24143 Heavy file fragmentation with multiple ndbd on single fs
    
    If we have the XFS headers (at build time) we can use XFS specific ioctls
    (once testing the file is on XFS) to better allocate space.
    
    This dramatically improves performance of mysql-test-run cases as well:
    
    e.g.
    number of extents for ndb_dd_basic tablespaces and log files
    BEFORE this patch: 57, 13, 212, 95, 17, 113
    WITH this patch  :  ALL 1 or 2 extents
    
    (results are consistent over multiple runs. BEFORE always has several files
    with lots of extents).
    
    As for timing of test run:
    BEFORE
    ndb_dd_basic                   [ pass ]         107727
    real    3m2.683s
    user    0m1.360s
    sys     0m1.192s
    
    AFTER
    ndb_dd_basic                   [ pass ]          70060
    real    2m30.822s
    user    0m1.220s
    sys     0m1.404s
    
    (results are again consistent over various runs)
    
    similar for other tests (BEFORE and AFTER):
    ndb_dd_alter                   [ pass ]         245360
    ndb_dd_alter                   [ pass ]         211632

    So what about the patch? It’s actually really tiny:

    
    --- 1.388/configure.in	2006-11-01 23:25:56 +11:00
    +++ 1.389/configure.in	2006-11-10 01:08:33 +11:00
    @@ -697,6 +697,8 @@
    sys/ioctl.h malloc.h sys/malloc.h sys/ipc.h sys/shm.h linux/config.h \
    sys/resource.h sys/param.h)
    
    +AC_CHECK_HEADERS([xfs/xfs.h])
    +
     #--------------------------------------------------------------------
    # Check for system libraries. Adds the library to $LIBS
    # and defines HAVE_LIBM etc
    
    --- 1.36/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp	2006-11-03 02:18:41 +11:00
    +++ 1.37/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp	2006-11-10 01:08:33 +11:00
    @@ -18,6 +18,10 @@
    #include
    #include
    
    +#ifdef HAVE_XFS_XFS_H
    +#include
    +#endif
    +
     #include "AsyncFile.hpp"
    
    #include
    @@ -459,6 +463,18 @@
    Uint32 index = 0;
    Uint32 block = refToBlock(request->theUserReference);
    
    +#ifdef HAVE_XFS_XFS_H
    +    if(platform_test_xfs_fd(theFd))
    +    {
    +      ndbout_c("Using xfsctl(XFS_IOC_RESVSP64) to allocate disk space");
    +      xfs_flock64_t fl;
    +      fl.l_whence= 0;
    +      fl.l_start= 0;
    +      fl.l_len= (off64_t)sz;
    +      if(xfsctl(NULL, theFd, XFS_IOC_RESVSP64, &fl) < 0)
    +        ndbout_c("failed to optimally allocate disk space");
    +    }
    +#endif
     #ifdef HAVE_POSIX_FALLOCATE
    posix_fallocate(theFd, 0, sz);
    #endif

    So get building your MySQL Cluster with the XFS headers installed and run on XFS for sweet, sweet disk allocation.

    Programme – linux.conf.au 2007

    The Programme for linux.conf.au 2007 has hit the streets (err.. web) and it’s looking pretty neat.

    I’m glad to see the MySQL and PostgreSQL miniconfs on different days – means I should be able to pop into the PostgreSQL one as well. Kernel could be interesting too… I guess it can depend on the sessions and stuff though.

    Greg Banks’ session on “Making NFS Suck Faster” should be interesting. Tridge’s session on “clustering tdb – a little database meets big iron” should be really interesting (after all, I hack on a clustered database for a crust). After lunch, I’m a bit torn between a few sessions – but Matthew Garrett‘s “Fixing suspend for fun and profit” could be a laugh.

    The next session will involve last minute jitters for my session (which is next: “eat my data: how everybody gets file IO wrong” – which will be great fun as there will no doubt be a bunch of smart people about ready to expand and clarify things.

    By the end of the day I’ll be torn between Keith Packard’s “X Monitor Hotplugging Sweetness” (Hopefully the extension will be called XBLING – as I keep tryning to convince him to call an X extension that) and Garbage Collection in LogFS by Jorn Engel.

    On Thursday, I’ll want to be in all the sessions at once – including Practical MythTV as presented by Mikal Still and myself. If you’re not in our session (and damn you for not being :) you should check out the no doubt other great things on: Dave Miller on Routing and IPSEC Lookup scaling in the linux kernel should be great fun, OzDMCA by Kim Weatherall will no doubt bring a tear to the eye, Rasmus on Faster and Richer Web Apps with PHP 5 (aparrently the aim when coding PHP is to not suck… so a lot of PHP “programmers” should take note – and ask to see how fast he can down a beer in), Andrew Cowie is talking on writing rad GTK apps (always fun when you can see something from your coding efforts). My photographer side of my brain is telling me to go to the GIMP Tutorial too. Hrrm… busy day (but our MythTV tute will ROCK – so show up and be converted).

    After a morning berocca (err… tea), the NUMA sessions sound interesting (especially on memory mapped files – going to be thinking about this and databases odly enough). Lunch, then the Heartbeat tutorial sounds interesting (from a “we have an internal one and i wonder what this does” PoV).

    Ted Ts’o is on enterprise real time… could be interesting as Ted’s a fun guy.

    On Friday, Ted’s ext4 talk is a must see – especially from a poking him in the ribs about what would be neat from a DB PoV (and the reminder of the real numbers in a benchmark boost to performance we see with XFS versus ext3).

    While wanting to be a cool kid like Rusty, Disk Encryption also sounds interesting, and Robert Collins could be talking about some interesting stuff (although the title “do it our way” isn’t giving much away).

    So, I’ve pretty much just planned a week in January down to the hour. If you’re not already going – get booked for linux.conf.au 2007 now – sure to sell out quickly. Going to be totally kick-ass.