effective bk usage

(inspired by jimw talking about it on Planet MySQL)

I take a bit of a different approach…

I’ve got directories for 4.0, 4.1 and 5.0, and within them, i have clones of the main ndb tree (called ndb, so there’s a path like “MySQL/5.0/ndb”). I don’t ever edit in this tree, it’s my clean one. I use it for pulling and pushing.

I then ‘bk clone -lq ndb foobar’ (where foobar is what i’m doing, or ndb-foobar, depending on mood). If i have a seperate part to foobar (e.g. stage2 foobar), i’d clone it off the foobar tree (e.g. ‘bk clone -lq foobar foobar2’). The idea being I can work on stage 2 before stage 1 is pushed (and find any problems with stage 1 and pull before i pull into stage2 and can no longer do work).

i use the -q option (quiet) to bk often because printing out a few thousand lines to screen tends to slow things down.

It would be interesting to investigate ways to improve hardlinking performance as a clone takes longer than it really should (read about 20000 inodes, write about 20000 inodes). Although (i haven’t checked this) – a clone probably *copies* the checked out files, not link them. So it’s really copy about 10000 files and link about 10000 files.

A ‘du -sh’ on a 5.0-ndb clone says 232MB. After ‘bk -r clean’ (i.e. have no files checked out) is 136MB. i.e. you are saving 136MB for cloning with the -l option. Now, if BK (and us really), we’d check out files always as read-only, which would also be hard linked across clones. This would further save 96MB per clone. When you check out a file for writing, it then creates a copy of it – so you only use the disk space for the files your editing (and only use the disk space for the sfile when you check in).

“What about directory disk usage?” I hear you ask. Well, a MySQL 5.0 clone has 1053 directories. So, for each clone, we’re using 1053 inodes. On XFS with 256 byte inodes (the default) this works out to be 263kb of disk space. Let’s consider a bad case where we need another block of disk space for each of these directories (to hold all those directory entries for those 10000 files). This would mean 4MB of disk usage.

For a really clever bk (and config) you could have a clean tree of 232MB and each clone only take up less than 5MB of disk space. As it is, the checked out files aren’t hard linked, so each clone takes up an extra 96MB of disk space.

In essence, we’re using 19 times more disk space per clone than we need to.

Now, if ccache is really clever (i don’t know if it is) it would hardlink object files so we use even *less* disk space for a compiled tree (a ‘du -sh’ on a compiled max-debug tree is about 1GB).

Why have I gone on about this so much? Well, disk is cheap – except in laptops where replacing disk or getting a big one isn’t easy or cheap.

Also, backup is expensive, slow and awkward.

maybe at some later time i’ll talk about the theoretical IO usage of some things….

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.