{"id":3655,"date":"2014-01-14T15:01:33","date_gmt":"2014-01-14T05:01:33","guid":{"rendered":"https:\/\/www.flamingspork.com\/blog\/?p=3655"},"modified":"2014-10-08T09:14:24","modified_gmt":"2014-10-07T23:14:24","slug":"converting-mysql-trees-to-git","status":"publish","type":"post","link":"https:\/\/www.flamingspork.com\/blog\/2014\/01\/14\/converting-mysql-trees-to-git\/","title":{"rendered":"Converting MySQL trees to git"},"content":{"rendered":"<p>I have put up a set of scripts on github: <a href=\"https:\/\/github.com\/stewartsmith\/bzr-to-git-conversion-scripts\">https:\/\/github.com\/stewartsmith\/bzr-to-git-conversion-scripts<\/a>. Why do I need these? Well&#8230; if only bzr fast-export|git fast-import worked flawlessly for large, complex and old trees. It doesn&#8217;t.<\/p>\n<p>Basically, when you clone this repo you can run &#8220;.\/sync-BLAH.sh&#8221; and it&#8217;ll pull BZR trees for the project, convert to git and clean things up a bit. You will likely have to edit the sync-BLAH.sh scripts as I have them pointed at branches on my own machine (to speed up the process, not having to do fresh BZR branches of MySQL trees over the network is a <strong>feature <\/strong>&#8211; it&#8217;s never been <em>fast<\/em>.). You&#8217;ll also want to edit the git remotes to point where you want git trees to end up.<\/p>\n<p>I&#8217;ve done it for:<\/p>\n<ul>\n<li><a href=\"http:\/\/bazaar-vcs.org\/\">Bazaar<\/a> itself (sync-bzr.sh)<\/li>\n<li><a href=\"http:\/\/www.drizzle.org\/\">Drizzle<\/a> (sync-drizzle.sh)<\/li>\n<li><a href=\"http:\/\/flamingspork.com\/projects\/libeatmydata\">libeatmydata<\/a> (sync-libeatmydata.sh)<\/li>\n<li><a href=\"http:\/\/www.mysql.com\">MySQL<\/a> (sync-mysql.sh)<\/li>\n<li><a href=\"http:\/\/percona.com\/software\/percona-server\/\">Percona Server<\/a> (sync-ps.sh)<\/li>\n<li><a href=\"http:\/\/percona.com\/software\/percona-xtradb-cluster\">Percona XtraDB Cluster<\/a> (sync-pxc.sh)<\/li>\n<li><a href=\"http:\/\/www.percona.com\/software\/percona-xtrabackup\">Percona XtraBackup<\/a> (sync-xb.sh)<\/li>\n<\/ul>\n<p>What problems did I hit? Well&#8230; the first is performance, things are <strong>slow<\/strong> unless you tweak a bunch of knobs, and then it&#8217;s just rather slow rather than <strong>slow<\/strong>. So in the empty git repo I set core.compression=1, which makes zlib a whole lot faster.<\/p>\n<p>I naturally give the correct incantation to bzr fast-export to munge tag names appropriately, set a git branch name (each BZR branch ends up as a git branch) and use a marks file (this speeds up incremental syncs).<\/p>\n<p>For one of these branches I was importing, BZR had allowed the invalid committer of &#8220;billy-earney billy.earney@gmail.com\\n &lt;&gt;&#8221; &#8211; yes, a <strong>newline<\/strong> in the committer. This messes up the fast-import format so I have to run the entire fast-export output through sed to clean it up.<\/p>\n<p>We then use bzr fast-import-filter to apply a <a href=\"https:\/\/github.com\/stewartsmith\/bzr-to-git-conversion-scripts\/blob\/master\/user-map.txt\">user map<\/a> &#8211; which is me looking at the appropriate committers and cleaning them up so that we get better attribution in the resulting git trees as well as cleaning up some errors in the bzr tree so that Git likes them (most notably, missing &lt; or (not and) &gt; around email addresses). The user map is fairly Percona specific, but there&#8217;s at least one or two for Oracle committers too.<\/p>\n<p>Next, I pass the output through pv(1) &#8211; to do two things: monitor the output to see that it&#8217;s still going, and to have a transfer buffer so that git fast-import doesn&#8217;t stall waiting for output &#8211; amazingly enough, this gave a decent speed boost to import speed.<\/p>\n<p>Finally, when we&#8217;re done doing the import of all of the revisions for all of the bzr branches, if this is our first run, we set the HEAD ref to the last BZR branch name and then do a git repack. Through experimentation, I&#8217;ve found that &#8220;git repack -AdfF &#8211;depth=100 &#8211;window=500&#8221; is what gives me the smallest size possible.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I have put up a set of scripts on github: https:\/\/github.com\/stewartsmith\/bzr-to-git-conversion-scripts. Why do I need these? Well&#8230; if only bzr fast-export|git fast-import worked flawlessly for large, complex and old trees. It doesn&#8217;t. Basically, when you clone this repo you can &hellip; <a href=\"https:\/\/www.flamingspork.com\/blog\/2014\/01\/14\/converting-mysql-trees-to-git\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[76,75,14,423],"tags":[70,129,628],"class_list":["post-3655","post","type-post","status-publish","format-standard","hentry","category-code","category-drizzle-work-et-al","category-mysql","category-percona","tag-drizzle","tag-git","tag-mysql"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p5a6n8-WX","jetpack-related-posts":[{"id":3413,"url":"https:\/\/www.flamingspork.com\/blog\/2013\/09\/24\/disk-usage-bzr-vs-git\/","url_meta":{"origin":3655,"position":0},"title":"Disk usage: bzr vs git","author":"Stewart Smith","date":"2013-09-24","format":false,"excerpt":"For MySQL 5.1, 5.5 and 5.6 in the same repository, after repacking: bzr: 269MB (217MB pack, 52MB indicies) git: 177MB repo (152MB pack) One thing I'll say is that BZR is always more chatty over the network and is substantially slower than GIT in pulling a fresh copy.","rel":"","context":"In &quot;code&quot;","block_context":{"text":"code","link":"https:\/\/www.flamingspork.com\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":3419,"url":"https:\/\/www.flamingspork.com\/blog\/2013\/10\/02\/the-end-of-bazaar\/","url_meta":{"origin":3655,"position":1},"title":"The end of Bazaar","author":"Stewart Smith","date":"2013-10-02","format":false,"excerpt":"I've used the Bazaar (bzr) version control system since roughly 2005. The focus on usability was fantastic and the team at Canonical managed to get the entire MySQL BitKeeper history into Bazaar - facilitating the switch from BitKeeper to Bazaar. There were some things that weren't so great. Early on\u2026","rel":"","context":"In &quot;code&quot;","block_context":{"text":"code","link":"https:\/\/www.flamingspork.com\/blog\/category\/code\/"},"img":{"alt_text":"Screenshot from 2013-10-02 10:32:19","src":"https:\/\/i0.wp.com\/www.flamingspork.com\/blog\/wp-content\/uploads\/2013\/10\/Screenshot-from-2013-10-02-103219-300x59.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":3416,"url":"https:\/\/www.flamingspork.com\/blog\/2013\/09\/26\/an-experimental-git-mirror-of-drizzle\/","url_meta":{"origin":3655,"position":2},"title":"An Experimental GIT mirror of Drizzle","author":"Stewart Smith","date":"2013-09-26","format":false,"excerpt":"I've been mirroring a bunch of projects that have their source control in BZR up onto github recently. This turns out to be a bit harder than it sounds for a bunch of reasons that aren't particularly interesting (although having a commit in the bzr repo where the name of\u2026","rel":"","context":"In &quot;code&quot;","block_context":{"text":"code","link":"https:\/\/www.flamingspork.com\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1016,"url":"https:\/\/www.flamingspork.com\/blog\/2008\/02\/22\/bzr-loom-a-bzr-plugin-with-quilt-like-functionality\/","url_meta":{"origin":3655,"position":3},"title":"bzr-loom &#8211; a bzr plugin with quilt like functionality","author":"Stewart Smith","date":"2008-02-22","format":false,"excerpt":"A bzr plugin to assist in developing focused patches. in Launchpad I use quilt a lot for development. Currently, If I had to choose between BK and quilt - I'd choose quilt. I use bzr in other development projects like MemberDB. I use git as a frontend for SVN (it\u2026","rel":"","context":"In &quot;MemberDB&quot;","block_context":{"text":"MemberDB","link":"https:\/\/www.flamingspork.com\/blog\/category\/memberdb\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":793,"url":"https:\/\/www.flamingspork.com\/blog\/2007\/02\/14\/svn-shows-its-true-colours\/","url_meta":{"origin":3655,"position":4},"title":"SVN shows its&#8217; true colours","author":"Stewart Smith","date":"2007-02-14","format":false,"excerpt":"I thought \"svn\", I typed \"cvs\". Hrrm... sounds about right. In other revision control news, using quilt to manage work-in-progress patches in conjunction with BK is proving really, really great. I feel like an idiot having lived this long and not worked this way. I have a feeling that if\u2026","rel":"","context":"In &quot;mysql&quot;","block_context":{"text":"mysql","link":"https:\/\/www.flamingspork.com\/blog\/category\/work-et-al\/mysql\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":3254,"url":"https:\/\/www.flamingspork.com\/blog\/2013\/03\/12\/is-mysql-bigger-than-linux\/","url_meta":{"origin":3655,"position":5},"title":"Is MySQL bigger than Linux?","author":"Stewart Smith","date":"2013-03-12","format":false,"excerpt":"I'm going to take the numbers from my previous post, MySQL Modularity, Are We There Yet? for the \"kernel\" size of MySQL - that is, everything that isn't a plugin or storage engine. For Linux kernel, I'm just going to use the a-bit-old git tree I have on my laptop.\u2026","rel":"","context":"In &quot;code&quot;","block_context":{"text":"code","link":"https:\/\/www.flamingspork.com\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/posts\/3655","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/comments?post=3655"}],"version-history":[{"count":2,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/posts\/3655\/revisions"}],"predecessor-version":[{"id":3826,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/posts\/3655\/revisions\/3826"}],"wp:attachment":[{"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/media?parent=3655"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/categories?post=3655"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/tags?post=3655"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}