Using Dtrace to find out if the hardware or Solaris is slow (but really just working around the problem)

A little while ago, I was the brave soul tasked with making sure Drizzle was working properly and passing all tests on Solaris and OpenSolaris. Brian recently blogged about some of the advantages of also running on Solaris and the SunStudio compilers – more warnings from the compiler is a good thing. Many kudos goes to Monty Taylor for being the brave soul who fixed most of the compiler warnings (and for us, warnings=errors – so we have to fix them) for the SunStudio compilers before I got to making te tests work.

So, I got to the end of it all and got pointed to an OpenSolaris x86 box where the drizzleslap test was timing out. The timeout for tests is some amazingly long amount of time – 15 minutes. All the drizzle-test-run tests are rather short tests.

To make running the tests quick, I usually LD_PRELOAD libeatmydata – a simple way of disabling pesky things like fsync that take a long time (rumors that I nickname it libmacosxsimulation are entirely true). It’s pretty simple to build libeatmydata on Solaris too (I periodically do this and always intend to check in the associated Makefile but never do).

Unfortunately, on OpenSolaris a bunch of things are built 32bit and others 64bit and just doing “LD_PRELOAD=libeatmydata.so ./dtr” doesn’t work – I’d have to modify the test script to only do the LD_PRELOAD for drizzled – which is annoying.

On my T1000 running Debian, the drizzleslap test takes 42 seconds to complete with libeatmydata, or 393 seconds when it’s really doing fsyncs. So for it to be timing out on this OpenSolaris x86 box – i.e. taking more than 15 minutes, was strange.

So… what was going on? Step 1: is anything actually going on? One way to test this is to see if disk IO is being generated. On Linux, we can use “iostat”. On Solaris, we can use “zpool iostat”. Things were going to disk for the whole time of the test. Time to compare what the difference between the platforms is.

Well.. a typical way that tests have taken forever have been because of lots of transactions: i.e. lots of fsync(). You are then dependent on the fsync() performance.

If we look at “iostat -x” and the avgrq-sz field on Linux, we’ll see that the average request size is on 10-12 sectors (512 byte blocks). i.e. about 5 or 6kb.

If we look at “zpool iostat 1″ on OpenSolaris, we see a bit of a different story, but similar enough that you could safely assume that lots of small synchronous IOs were going on. After a bit of reading of the ZFS on-disk format documents, I had a slightly better idea what was going on that could be causing me seeing a larger average request size on ZFS than on Linux with XFS.

So… perhaps it’s the speed of these syncs? Ordinarily, I’d just write up a quick LD_PRELOAD library that wraps fsync() and times it (perhaps writing to a file so I could do analysis on it later). Since I was working on Solaris… I thought I’d try DTrace. Some google-foo and dtrace hacking later, I tried this:

stewart@drizzle-dev:~/drizzle/sparc$ time pfexec dtrace -n ‘syscall::fdsync:entry /execname == “drizzled” / { self->ts[self->stack++] = timestamp; } syscall::fdsync:return /self->ts[self->stack - 1]/ { this->elapsed = timestamp – self->ts[--self->stack]; @[probefunc] = count(); @a[probefunc] = quantize(this->elapsed); self->ts[self->stack] =0; }’

dtrace: description 'syscall::fdsync:entry ' matched 2 probes
^C

  fdsync                                                         1600
  fdsync
           value  ------------- Distribution ------------- count
        33554432 |                                         0
        67108864 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   1520
       134217728 |@@                                       79
       268435456 |                                         1
       536870912 |                                         0        

real	4m26.837s
user	0m0.657s
sys	0m0.566s

Which did seem like an awful long time for an fsync() to take. Although the filesystem was on a single disk, it was meant to be made remotely recently, and it’s sitting on a Sun controller… so it should be a bit better than that. From reading some of the ZFS on-disk spec, it could be some bug that means we’re waiting for a checkpoint to be written instead of forcing the sync out when we call fsync() – but I sought another solution (as on other Solaris/OpenSolaris systems this wasn’t a problem – so perhaps fixed in newer kernels or it’s a driver issue).

So I went and added “–commit=100″ to a bunch of places in the drizzleslap test to batch things into transactions. The idea being to greatly reduce the number of fsync() calls to bring the execution time of the drizzleslap test on the machine to get below 15minutes. A bit of jiggerypokery later (some tests needed to not have the –commit to avoid various locking foo) and I had something that should run.

Now, ~113 seconds on the T1000 on Linux (with a single SATA disk, down from an original 393 seconds) and ~437 seconds on the OpenSolaris box. For giggles, tried it on a Solaris box that’s running UFS on a 10k RPM SAS drive: ~44 seconds.

In Summary:

T1000, Linux, libeatmydata, XFS: ~42 seconds (before optim)
T1000, Linux, 7200RPM SATA, XFS: ~113 seconds
T5240, Solaris 10, 10k RPM SAS, UFS: ~44 seconds
16 core Xeon, OpenSolaris, 7200RPM, ZFS: ~437 seconds

So, on that hardware setup – something is strange. The 10k SAS drive on UFS on the CoolThreads box is really nice though…. makes me want that kind of disk here.

This page was useful, and I used it as a basis for some of my DTrace scripts: http://fav.or.it/post/1146360/dtrace-and-the-mighty-hercules

Also thanks to several people on #opensolaris on Freenode who helped me out with various Solaris specific commands in tracking this down.

OpenSolaris 2008.11 first impressions

Using the wonderful bittorrent, I got the CD image in next to no time (in contrast to the Solaris Express image I’m currently downloading via HTTP that’s taking forever).

Boot time in VirtualBox (off the ISO image) was rather quick, usual questions on keyboard layout and desired language (it’d be neater if these were GUI questions… but anyway). The GNOME desktop loaded up, popup window informed me that it had connected to the network. Awesome.

Package manager: opend quickly, using the opensolaris.org repository and it does seem to have a lot of packages… even MySQL 5.0.67 (and 4.0.24). Not 5.1 though, but it is early days (and it was just released as GA the other day).

At least one unusual thing was SUNWgrub and SUNWgrubS (where the S is for source). I assume this is some packaging oddity as I don’t ese other packages like this.

SUNWii wins the odd package name award.

The Time Slider seems like possibly the most awesome thing ever. It periodically takes ZFS snapshots of your disk and presents you with a time slider in nautilus so you can just view your data how it was in the past (at previous snapshots).

I can’t see how to change the keyboard layout to DVORAK (at least while booted off the CD image).

The getting started guide also shows how to get a development environment going… this is quite promising. Will do proper install shortly and do a step by step “building drizzle on OpenSolaris 2008.11″ post.

Firefox on OpenSolaris fixed (and installed bzr)

Thanks to Glynn for pointing me to the right thread on opensolaris.org (in a comment on my Good adventures with OpenSolaris post). The package verification thingy (pkg verify -v -f SUNWfirefox) did actually throw an error (indicating some sort of problem). So that’s pretty neat. The fact that it got into trouble in the first place isn’t good, but corruption detection is the next best thing.

I still occationally hit the bug in VirtualBox where if you have 127.0.0.1 in your resolv.conf on your host (e.g. running a local caching nameserver), VirtualBox passes this through to the guest, so the guest tries to use the guest 127.0.0.1 as a nameserver – this usually doesn’t work so well.

The good news is, Firefox now works in my OpenSolaris VM.

The bad news is that even though I’ve gone and set my keyboard layout as DVORAK (with the Input Method Switcher applet), whath should be ctrl-l (for location bar) in Firefox, actually brings up the Print dialog (on DVORAK, L is where P is on QWERTY).

But, I’ve managed to download bazaar now, and the install was simple (just follow INSTALL in the bzr tarball). At some point I’ll badger someone to make an OpenSolaris package for it so you could do “pkg install bzr”, but you can’t do that yet.

The next challenge will be to branch repositories from the host onto a temp drive, build and test.

Good adventures with OpenSolaris

First of all, thanks to everyone who commented on my previous OpenSolaris entry (which wasn’t really positive at all).

I recently tried again – this time starting with an ISO of build 93. I’d recommend completely ignoring the 2008.05 release and going straight for the build 93 image.

Installed easily in VirtualBox, adding the VirtualBox extensions was easy. Select “Devices -> Install Guest Additions” in the VirtualBox menu, then when logged into the OpenSolaris install, do the following:

su

pkgadd -d /media/VBOXADDITIONS_1.6.0_30421/VBoxSolarisAdditions.pkg

(you then say yes, i really do want to install it. rather obvious. I had to do this step again after the “pkg image-update” below though). Just logging out and then back in again gets you all the awesomeness you’d expect from running other guests (such as that system released by a large corporation in Redmond).

The “pkg image-update” went as expected, and I’m now running build 94.

I installed SunStudio Express (compilers) pretty easily – “pkg install sunstudio”. Unfortunately, this is all in /opt/SunStudioExpress and not in $PATH, which would have been much more useful. I guess there’s still a bit to go before usability nirvana. Also, no .desktop entries, so have to explicitly run /opt/SunStudioExpress/bin/sunstudio to get the NetBeans gui. Presumably if i add /opt/SunStudioExpress/bin to PATH, building random software packages will be nicer.

So, I then want bzr so i can pull source repositories. Monty Taylor informs me that the magic packages you want are: SUNWgcc, gcc-dv and SUNWtoo. Then you can build bzr as downloaded from the website. Installed these easily.

However, now trying to get the bzr source:
$ firefox
ld.so.1: firefox-bin: fatal: /usr/lib/firefox/libxul.so: corrupt or truncated file

and then symbol kPStaticModules: referenced symbol not found.

So maybe I shouldn’t have upgraded to build 94…..

But certainly in much better shape than the may release, but be warned, it’s still a work-in-progress and some things may sporadically not work from time to time (e.g. like firefox and now).

Hopefully, some time soon I’ll get a MySQL build (well… really I want MySQL Cluster, and later drizzle) going and will really be able to hammer these things with dtrace.

Adventures with OpenSolaris

So… some colleagues have been experimenting with DTrace a bit, and I’ve been (for a while now) wanting to experiment with it.

The challenge now, instead of in the past, is that I’m setting up a Solaris based system – not getting one premade.

I chose OpenSolaris as I’d previously tried Solaris 10 and just sunk too much time trying to get updates and a development environment installed (another colleague could get the opposite to me going: he got devtools but no updates. at least mine was up to date and secure… but without a compiler).

So… OpenSolaris. It isn’t 100% open, there’s binary only drivers and such… but compared to previous Solaris, a whole lot better. Now, if only it was GPL licensed so we could have cross-pollination with Linux.

I grabbed the 2008.05 ISO as soon (in fact, slightly before) it was released and installed it in VirtualBox.

The installation was shiny – one of the best OS installs I’ve seen in a while. It set up nice things (zfs, X) and (an improvement on the previous release) even managed to get all the hardware going (not sound though).

However, on first reboot, nasty surprise. DNS isn’t enabled by default.

I found out why DNS isn’t enabled by default – and (as usual) this comes down to hysterical raisins. Back in what we laughingly call the past, during install Solaris would ask you what services you wanted to use for name resolution (which I guess made sense when people used yp/NIS more often than DNS). The default didn’t include DNS.

In the graphical installer, it just chose the default without asking… which is no DNS. So my mother would be able to install OpenSolaris, but once done, she’d have to know to type in 150.101.98.214 instead of www.google.com.au into Firefox. However, I swallowed my pride, edited /etc/nsswitch.conf and went along my business (I wonder the percentage of users who would actually go from “hrrm, internet not working” to editing /etc/nsswitch.conf without intense googling).

The UI did look nice though. Nice looking GDM, GNOME desktop looked nice. You could tell that whoever did the theme had spent too much time near MacOS X, but I’ll forgive them for that. The default shell is remotely sane and even though the bash completions aren’t as funky as on Ubuntu, I managed (unlike sitting at cmd.exe, where somebody is likely to die each time my keystrokes end up there).

I even had a look at the graphical package management tool – which looked quite nice. I even tried to do an update via it… which ended in what seemed to be a locked package manager and general amounts of fail. To see if it had just stopped or was chewing up my CPU or memory, I opened a terminal and ran ‘top’.

I then found out that top isn’t installed by default. It’s 57kb on my Ubuntu 8.04 laptop so disk space couldn’t be the reason why it’s not installed. It’s certainly not a “it’s a minimal install” argument, there’s lots of other things there by default.

Next step, let’s get updates (some time had elapsed between first install and now).

Seeing as I hadn’t met too much success with the graphical utility (it was at version 0.0000001 or something, so I don’t lay blame there). I find out that ‘pkg image-update’ is what you want to run. So I do.

It chugs for a while and says there’s 1GB of updates. That’s okay, I (where I=Sun) pay for what here on the arse end of the Internet is considered a decent link to my home office. About 20-30minutes later, having downloaded about 600MB, it goes “url timedout error” and aborts. Oh well I think, that’s easy – i’ll run it again and it’ll just resume downloading (remember the revolution when that started working, you know, in 1997).

I then discovered that pkg doesn’t resume downloads. It creates a snapshot using ZFS and puts the updates in it. If anything goes wrong, it just deletes the snapshot. This is a huge benefit over (say) dpkg, which if you press the reset button at the right time will leave your system very, very fucked (magic incantations can revive it, but it’s not fun – and the dpkg developers don’t think it’s a problem – come to my “Eat My Data” talk at OSCON to find out the full story). So OpenSolaris pkg wins on the “don’t ruin my working OS install already” front, but fails on resuming downloads.

I try again. Same story.

It’s now wasted a bit over 1GB of downloads… which equates to a couple of dollars.

I wait a few days, a week, and try again. Same story. I even try with a few hints found online that should fix things (well.. they did let another 100MB on average download before dying with the same story).

I then decided to just try and do the minimal – I wanted a development environment so I could build a MySQL Server with NDB and then play with DTrace to help nut out a performance problem or two.

So i tell pkg to install SunStudio Express. I’m even using instructions off sun.com, so it has to work.

It’s only ~500MB now (IIRC). Fails with exactly the same error as before (url timedout). Gah!

So, this brings us to today. I head into the Sun office.

I figure “this just has to work from a Sun office… ” and I was right!

It got through the (now) 1500MB download of updates!

It even applied them!

Success!

Win!

Well, no, – FAIL.

It now refused to boot with the updates. Or rather, it just rebooted soon after having started booting. No panic, no error screen, no “will reboot in 120 seconds” or anything useful. Instead, you just saw a flicker of the error message before it rebooted.

So… with some very careful pause/unpause of the VM (thanks VirtualBox… I also have a feature request now – pause before reboot :) I got this:

Aparrently the successful update, not so much.

Hrrm… perhaps select the known good one from the GRUB menu? It did actually boot! But this wasn’t just the old kernel, it was the whole older system. I guess that’s a possible upside of ZFS snapshots…. but oh my, that could be sooooo subtle and lead to data loss that it’s really quite dangerous.

I was still no closer to getting an up to date opensolaris system with enough developer tools to build a MySQL Server and use dtrace.

And this was enough. It’s now gone and I get my 10GB of disk back.

Maybe I’ll try again later… but I’m finding the google-perftools to be rather exciting and they’re really satisfying shiny thing urges at the moment.