and now for something completely different…

As many of you know, I’ve been working in the MySQL world for quite a while now. IN fact, it was nearly 10 years ago when I first started hacking on MySQL Cluster at MySQL AB.

Most recently, I was at Percona which was a wonderful journey where over my nearly three years there the company at least doubled in size, launched several new software products and greatly improved the quality and frequency of releases.

However the time has come for something completely different. The MySQL world is rather mature, the future of Percona software is bright and, well, I could do with poking into something rather different.

So a couple of weeks ago I started at IBM in the Linux Technology Centre working on KVM on POWER and related things. No doubt there’ll be interesting things to blog about as time goes on, but it’s about time I posted my change of employment :)

Converting MySQL trees to git

I have put up a set of scripts on github: https://github.com/stewartsmith/bzr-to-git-conversion-scripts. Why do I need these? Well… if only bzr fast-export|git fast-import worked flawlessly for large, complex and old trees. It doesn’t.

Basically, when you clone this repo you can run “./sync-BLAH.sh” and it’ll pull BZR trees for the project, convert to git and clean things up a bit. You will likely have to edit the sync-BLAH.sh scripts as I have them pointed at branches on my own machine (to speed up the process, not having to do fresh BZR branches of MySQL trees over the network is a feature – it’s never been fast.). You’ll also want to edit the git remotes to point where you want git trees to end up.

I’ve done it for:

What problems did I hit? Well… the first is performance, things are slow unless you tweak a bunch of knobs, and then it’s just rather slow rather than slow. So in the empty git repo I set core.compression=1, which makes zlib a whole lot faster.

I naturally give the correct incantation to bzr fast-export to munge tag names appropriately, set a git branch name (each BZR branch ends up as a git branch) and use a marks file (this speeds up incremental syncs).

For one of these branches I was importing, BZR had allowed the invalid committer of “billy-earney billy.earney@gmail.com\n <>” – yes, a newline in the committer. This messes up the fast-import format so I have to run the entire fast-export output through sed to clean it up.

We then use bzr fast-import-filter to apply a user map – which is me looking at the appropriate committers and cleaning them up so that we get better attribution in the resulting git trees as well as cleaning up some errors in the bzr tree so that Git likes them (most notably, missing < or (not and) > around email addresses). The user map is fairly Percona specific, but there’s at least one or two for Oracle committers too.

Next, I pass the output through pv(1) – to do two things: monitor the output to see that it’s still going, and to have a transfer buffer so that git fast-import doesn’t stall waiting for output – amazingly enough, this gave a decent speed boost to import speed.

Finally, when we’re done doing the import of all of the revisions for all of the bzr branches, if this is our first run, we set the HEAD ref to the last BZR branch name and then do a git repack. Through experimentation, I’ve found that “git repack -AdfF –depth=100 –window=500” is what gives me the smallest size possible.

My lca2014 talk video: Past, Present and Future of MySQL and variants

On last Wednesday morning I gave my talk at linux.conf.au 2014. You can now view and download the recording of it here:

http://mirror.linux.org.au/linux.conf.au/2014/Wednesday/28-Past_Present_and_future_of_MySQL_and_variants_-_Stewart_Smith.mp4

(hopefully more free formats will come soon, the all volunteer AV team has been absolutely amazing getting things up this quickly).