New libeatmydata release: 105

Over on the project page and on launchpad you can now download libeatmydata 105.

This release fixes a couple of bugs that came in via the Debian project, including a rather interesting one about some binaries not running .so ctors to properly init libeatmydata and the code path in the libeatmydata open() not really dealing with being called first in this situation.

Enjoy!

popcon-historical: a tool for monitoring package popularity in debian/ubuntu

I’ve just uploaded (where ‘just’ is defined as “a little while ago”) popcon-historical to github. It’s a rather rudimentary way to look at the popcon data from Debian and Ubuntu over time. It loads all the data into a Drizzle database and then has a small perl web-app to generate graphs (and CSV).

Github: https://github.com/stewartsmith/popcon-historical

I’ve also put up a project page on it: https://flamingspork.com/popcon-historical/

An example graph is this one of Percona Toolkit vs Maatkit installs in Ubuntu over time:

You can actually get it to graph any package (which, unlike the graphs on debian.org, the package does not have to be in the Debian archive to graph it over time – it can be a package from third party repos).

An argument for popcon

There is a package called popularity-contest that’s available in both Debian and Ubuntu (and likely other Debian derivatives). It grabs the list of packages installed on the machine and submits it to the Debian or Ubuntu popularity contests.

There you can see which are the most popular packages in Debian and Ubuntu. Unsurprisingly, dpkg, the package manager is rather popular.

Why should you enable it? Looking at popcon results are solid numbers as to how many users you may have. Although the absolute numbers may not be too accurate, it’s a sample set and if you examine the results over time you can start to get an idea on if your software is growing in popularity or not.

But there’s something more than that, if you can prove that a lot of people are installing your software on Debian, then you’re likely going to be able to argue for more work time being spent on improving the packaging for Debian.

Quite simply, enabling popcon is a way to help people like me argue for more time being spent on making Debian better.

Pogoplug as a NAS

A while ago (April) I bought a Pogoplug with the explicit idea of using it as a NAS device. I finally bought a new 2TB drive and plugged in the pogoplug. I pretty much instantly realised I was going to run Debian on it instead, if only because that’s what makes me comfortable: Debian, ssh and XFS.

Installing Debian was easy (google it) and incredible props for not only an attractive looking device, but an easily hackable one (the default software is probably quite good for non-experts… I just happen to want Debian).

I have to say, I’m so far rather happy.

Debian unstable on a Sun Fire T1000

So i got the T1000 working again (finally, after much screwing about trying to get the part). I then hit the ever annoying “no console” problem, where the console didn’t work – kind of problematic.

After a firmware upgrade, and passing “console=/dev/ttyS0” to the kernel, things work.

So the T1000 firmware 6.3 doesn’t work with modern debian kernels. Thing swork with 6.7 though.

Using Dtrace to find out if the hardware or Solaris is slow (but really just working around the problem)

A little while ago, I was the brave soul tasked with making sure Drizzle was working properly and passing all tests on Solaris and OpenSolaris. Brian recently blogged about some of the advantages of also running on Solaris and the SunStudio compilers – more warnings from the compiler is a good thing. Many kudos goes to Monty Taylor for being the brave soul who fixed most of the compiler warnings (and for us, warnings=errors – so we have to fix them) for the SunStudio compilers before I got to making te tests work.

So, I got to the end of it all and got pointed to an OpenSolaris x86 box where the drizzleslap test was timing out. The timeout for tests is some amazingly long amount of time – 15 minutes. All the drizzle-test-run tests are rather short tests.

To make running the tests quick, I usually LD_PRELOAD libeatmydata – a simple way of disabling pesky things like fsync that take a long time (rumors that I nickname it libmacosxsimulation are entirely true). It’s pretty simple to build libeatmydata on Solaris too (I periodically do this and always intend to check in the associated Makefile but never do).

Unfortunately, on OpenSolaris a bunch of things are built 32bit and others 64bit and just doing “LD_PRELOAD=libeatmydata.so ./dtr” doesn’t work – I’d have to modify the test script to only do the LD_PRELOAD for drizzled – which is annoying.

On my T1000 running Debian, the drizzleslap test takes 42 seconds to complete with libeatmydata, or 393 seconds when it’s really doing fsyncs. So for it to be timing out on this OpenSolaris x86 box – i.e. taking more than 15 minutes, was strange.

So… what was going on? Step 1: is anything actually going on? One way to test this is to see if disk IO is being generated. On Linux, we can use “iostat”. On Solaris, we can use “zpool iostat”. Things were going to disk for the whole time of the test. Time to compare what the difference between the platforms is.

Well.. a typical way that tests have taken forever have been because of lots of transactions: i.e. lots of fsync(). You are then dependent on the fsync() performance.

If we look at “iostat -x” and the avgrq-sz field on Linux, we’ll see that the average request size is on 10-12 sectors (512 byte blocks). i.e. about 5 or 6kb.

If we look at “zpool iostat 1” on OpenSolaris, we see a bit of a different story, but similar enough that you could safely assume that lots of small synchronous IOs were going on. After a bit of reading of the ZFS on-disk format documents, I had a slightly better idea what was going on that could be causing me seeing a larger average request size on ZFS than on Linux with XFS.

So… perhaps it’s the speed of these syncs? Ordinarily, I’d just write up a quick LD_PRELOAD library that wraps fsync() and times it (perhaps writing to a file so I could do analysis on it later). Since I was working on Solaris… I thought I’d try DTrace. Some google-foo and dtrace hacking later, I tried this:

stewart@drizzle-dev:~/drizzle/sparc$ time pfexec dtrace -n ‘syscall::fdsync:entry /execname == “drizzled” / { self->ts[self->stack++] = timestamp; } syscall::fdsync:return /self->ts[self->stack – 1]/ { this->elapsed = timestamp – self->ts[–self->stack]; @[probefunc] = count(); @a[probefunc] = quantize(this->elapsed); self->ts[self->stack] =0; }’

dtrace: description 'syscall::fdsync:entry ' matched 2 probes
^C

  fdsync                                                         1600
  fdsync
           value  ------------- Distribution ------------- count
        33554432 |                                         0
        67108864 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   1520
       134217728 |@@                                       79
       268435456 |                                         1
       536870912 |                                         0        

real	4m26.837s
user	0m0.657s
sys	0m0.566s

Which did seem like an awful long time for an fsync() to take. Although the filesystem was on a single disk, it was meant to be made remotely recently, and it’s sitting on a Sun controller… so it should be a bit better than that. From reading some of the ZFS on-disk spec, it could be some bug that means we’re waiting for a checkpoint to be written instead of forcing the sync out when we call fsync() – but I sought another solution (as on other Solaris/OpenSolaris systems this wasn’t a problem – so perhaps fixed in newer kernels or it’s a driver issue).

So I went and added “–commit=100” to a bunch of places in the drizzleslap test to batch things into transactions. The idea being to greatly reduce the number of fsync() calls to bring the execution time of the drizzleslap test on the machine to get below 15minutes. A bit of jiggerypokery later (some tests needed to not have the –commit to avoid various locking foo) and I had something that should run.

Now, ~113 seconds on the T1000 on Linux (with a single SATA disk, down from an original 393 seconds) and ~437 seconds on the OpenSolaris box. For giggles, tried it on a Solaris box that’s running UFS on a 10k RPM SAS drive: ~44 seconds.

In Summary:

T1000, Linux, libeatmydata, XFS: ~42 seconds (before optim)
T1000, Linux, 7200RPM SATA, XFS: ~113 seconds
T5240, Solaris 10, 10k RPM SAS, UFS: ~44 seconds
16 core Xeon, OpenSolaris, 7200RPM, ZFS: ~437 seconds

So, on that hardware setup – something is strange. The 10k SAS drive on UFS on the CoolThreads box is really nice though…. makes me want that kind of disk here.

This page was useful, and I used it as a basis for some of my DTrace scripts: http://fav.or.it/post/1146360/dtrace-and-the-mighty-hercules

Also thanks to several people on #opensolaris on Freenode who helped me out with various Solaris specific commands in tracking this down.

Debian about 1234533 times easier to install than Solaris

After what many hours trying to netboot the T1000 to install Solaris Express, I wondered “how hard is it for debian?”.

Easy. get the sparc64 boot.img, put it on TFTP server, add “filename “boot.img”;” or similar to dhcp, boot the T1000 from the service console something like this “bootmode bootscript=”boot net:dhcp”\n restart -c” and install away!

As for Solaris?

Well… dhcpd.conf:

option space SUNW;
option SUNW.root-mount-options code 1 = text;
option SUNW.root-server-ip-address code 2 = ip-address;
option SUNW.root-server-hostname code 3 = text;
option SUNW.root-path-name code 4 = text;
option SUNW.swap-server-ip-address code 5 = ip-address;
option SUNW.swap-file-path code 6 = text;
option SUNW.boot-file-path code 7 = text;
option SUNW.posix-timezone-string code 8 = text;
option SUNW.boot-read-size code 9 = unsigned integer 16;
option SUNW.install-server-ip-address code 10 = ip-address;
option SUNW.install-server-hostname code 11 = text;
option SUNW.install-path code 12 = text;
option SUNW.sysid-config-file-server code 13 = text;
option SUNW.JumpStart-server code 14 = text;
option SUNW.terminal-name code 15 = text;
option SUNW.SbootURI code 16 = text;

host hurricane {
hardware ethernet 0:14:4f:1e:28:e;
fixed-address 192.168.1.19;
option host-name “hurricane”;
filename “sparc64-etch-boot.img”;
#       filename “sol-nv-b103-sparc”;
#       option SUNW.install-server-ip-address 192.168.1.1;
#       option SUNW.install-server-hostname “saturn”;
#       option SUNW.install-path “/mnt/sol-nv-b103-sparc/”;
#       option SUNW.root-server-ip-address 192.168.1.1;
#       option SUNW.root-server-hostname “saturn”;
#       option SUNW.root-path-name “/mnt/sol-nv-b103-sparc/Solaris_11/Tools/Boot”;

}

(obviously changing the comments around) and having the Solaris Express DVD mounted and NFS exported…. it *still* doesn’t work. It goes “unable to mount filesystem” with no further hints (even when tcpdumping the network).

Documentation for doing the simple thing of using $dhcp_server and $nfs_server to network boot a Solaris install on a Sparc box is *COMPLETELY* missing.

Now, I’m a smart guy (and if you don’t believe that, at least believe I’m not stupid). If I can’t get it to boot the installer, what chance do others have?

I’ll try OpenSolaris out when it’s on SPARC (and please oh please oh please just have an easy way to net boot the installer using a linux host). Please take the debian way (just a single file on tftp).

So now it’s goodbye Solaris (I’m not going to have something I can’t  re-install, upgrade or security patch) and it’s hello Debian (and sanity).

Yes, this does mean I’ll care about Drizzle on Linux Sparc.