Registered for linux.conf.au

So, I just registered for linux.conf.au and when ticking all those check boxes for years past, I worked out that this will be my tenth linux.conf.au! Wow… that’s a few of them.

Over the past 9 I’ve attended I’ve gone to great sessions, met interesting people, discovered interesting projects and made good friends.

I wonder what this one will bring….

Puppet snippet for setting up a machine to build Drizzle

You could use this in a Vagrant setup if you like (I’ve done so for testing).

Step 1) Set the following in your Vagrantfile:

Vagrant::Config.run do |config|
  config.vm.box = "lucid32"
  config.vm.box_url = "http://files.vagrantup.com/lucid32.box"
  config.vm.provision :puppet
end

Step 2) Get puppet-apt helper.

I used https://github.com/evolvingweb/puppet-apt and put it in a manifests/ directory like so:

$ mkdir manifests
$ cd manifests
$ git clone git://github.com/evolvingweb/puppet-apt.git

Step 3) Write your puppet manifest:

import "puppet-apt/manifests/init.pp"
import "puppet-apt/manifests/ppa.pp"
class drizzlebuild {
        apt::ppa { "ppa:drizzle-developers/ppa": }
        package { "drizzle-dev":
                  ensure => latest,
        }
}
include drizzlebuild

Step 4) “vagrant  up” and you’re done! Feel free to build Drizzle inside this VM.

I’m sure there may be some more proper way to do it all, but that was a pretty neat first intro to me to Puppet and friends :)

Puppet + Vagrant + jenkins = automated bliss

I’m currently teaching myself how to do Puppet. Why? Well, at Percona we support a bunch of platforms for our software. This means we have to maintain a bunch of Jenkins slaves to build the software on. We want to add new machines and have (up until now) maintained a magic “apt-get install” command line in the Jenkins EC2 configuration. This isn’t an ideal situation and there’s been talk of getting Puppet to do the heavy lifting for a while.

So I sat down to do it.

Step 1: take the “apt-get install” line and convert it into puppet speak.

This was pretty easy. I started off with Vagrant starting a Ubuntu Lucid 32 VM (just like in the Vagrant getting started guide) and enabled the provision using puppet bit.

Step 2: find out you need to run “apt-get update”

Since the base VM I’m using was made there had been updates, so I needed to make any package installation depend on running “apt-get update” to ensure I was both installing the latest version and that the repositories would have the files I was looking for.

This was pretty easy (once I knew how):

exec {"apt-update":
       command => "/usr/bin/apt-get update",
}
Exec["apt-update"] -> Package <| |>

This simply does two things: specify to run “apt-get update” and then specify that any package install depends on having run “apt-update” first.

I’ve also needed things such as:

case $operatingsystem {
     debian, ubuntu: { $libaiodev = "libaio-dev" }
     centos, redhat: { $libaiodev = "aio-devel" }
     default: { fail("Unrecognised OS for libaio-dev") }
}
package { "libaio-dev":
          name => $libaiodev,
          ensure => latest,
}

The idea being that when I go and test all this stuff running on CentOS, it should mostly “just work” there too.

The next step? Setting up and running the Jenkins slave.

Information on Bug#12704861 (which doesn’t exist in any public bug tracker)

Some of you may be aware that MySQL is increasingly using an Oracle-internal bug tracker. You can see these large bug numbers mentioned alongside smaller public bug numbers in recent MySQL release notes. If you’re particularly unlucky, you  just get a big Oracle-internal bug number. For a recently fixed bug, I dug further, posted up on the Percona blog: http://www.mysqlperformanceblog.com/2011/11/20/bug12704861/

Possibly interesting reading for those of you who interested in InnoDB, MySQL, BLOBs and crash recovery.

pandora-build: autotools made easy

Way back in 2009, Monty Taylor got fed up with maintaining a set of common autotools foo across several projects (one of which was Drizzle) and started the pandora-build project.  Basically, it’s a collection of the foo you need for autotools to do things like: use it properly, detect a bunch of common libraries, enable crap-tons of compiler warnings (and -Werror) and write an application/library with plugins (that are auto-discovered and built).

(and don’t worry, there’s also modes to disable -Werror and different compiler warnings if you’re working on an old code base that really doesn’t build cleanly)

There’s also templates for Quickly to get you up and started really quickly.

Basically, for the past 3 years, whenever I’ve gone to write some small project (or got sufficiently annoyed with the broken build system on an old one), I’ve turned to pandora-build to solve my problems.

Recently, I’ve had the need to use the plugin infrastructure of pandora-build in a new project (I’ve used it extensively in Drizzle of course). The one bit that pandora does not take care of for you is the dlopen() code to load plugins at run time… although I do wonder about turning some of that code into a bit of a library just because a bunch of it is pretty common….

Of course, a task for me is to write up a blog post on how I did it all, but for the moment I thought I’d just share :)

Using Jenkins to parse sphinx warnings

At Percona, we’re now using sphinx for our documentation. We’re also using Jenkins for our  continuous integration. We have compiler warnings from GCC being parsed by Jenkins using the built in filters, but there isn’t one for the sphinx warnings.

Luckily, in the configuration page for Jenkins, the Warnings plugin allows you to specify your own filters. I’ve added the following filter to process warnings from sphinx:

For those who want to copy and paste:

Regex: ^(.*):(\d+): \((.*)\) (.*)

Mapping Script

import hudson.plugins.warnings.parser.Warning
String fileName = matcher.group(1)
String lineNumber = matcher.group(2)
String category = matcher.group(3)
String message = matcher.group(4)

return new Warning(fileName, Integer.parseInt(lineNumber), "sphinx", category, message);

Example log message: /home/stewart/percona-server/docs-5.1/doc/source/release-notes/Percona-Server-1.0.2-3.rst:67: (WARNING/2) Inline literal start-string without end-string.

Then I can select this filter from the job that builds (and publishes) our documentation and it shows up like any other compiler warnings. Neat!

TODO: get the intersphinx warnings also in there

TODO: fix the linkcheck target in Sphinx so that it’s easily parseable and can also be integrated.

Speaking at Percona Live London 2011 (on Drizzle!)

Both Henrik and myself will be at Percona Live London 2011 in late October speaking on the wonderful Drizzle database server.

Other speakers at the conference will be talking about a wide range of topics surrounding the MySQL ecosystem including performance monitoring, backup, search, scaling and data recovery.

P.S. I do have a discount code – ask me in the comments for it!

Without notmuch, I would simply delete your email

I have been using notmuch (http://notmuchmail.org/) as my email client for quite a while now. It’s fast. I don’t mean that everything happens instantly (some actions do take a bit longer than ideally they would), but with the quantity of mail I (and others) throw at it? Beats everything else I’ve ever tried.

I keep seeing people complain about not being able to keep up with various email loads and I am convinced it is because their mail client sucks.

I hear people go on about mutt…. well, I stopped using mutt when it would take two minutes to open some of my mail folders.

I hear some people talk about Evolution…. well, I stopped using Evolution when I realized that when it was rebuilding its index, I couldn’t use my mail client for at least twenty minutes.

Gmail…. well, maybe. Except that I don’t want all my mail to be sitting on google servers, I want to be able to work disconnected and the amount of time it would take to upload my existing mail makes it a non-starter (especially from the arse end of the internet – Australia). I also do not want email on my phone.

My current problem with notmuch? It just uses Maildir…. and this isn’t the most efficient for mail that never changes, some kind of archive format that is compressed would be great. Indeed, I started looking into this ages ago, but just haven’t had the spare cycles to complete it (and getting SSDs everywhere has not helped with the motivation).

Coming back from vacation, my mailbox had about 4,700 messages sitting in it. I’ve been able to get through just about all of them without blindly deleting mail. This is largely due to the great UI of notmuch for being able to quickly look at threads and then mark as read, quickly progressing to the next message. I can tag mail for action, I can very quickly search for email on an urgent topic (and find it) and generally get on with the business of getting things done rather than using an email program.

MySQL no longer fully open source database

Just in case anybody missed it: http://blogs.oracle.com/MySQL/entry/new_commercial_extensions_for_mysql

MySQL has long been an open source product, not an open source project…. and this really is the final nail in that.

To me, this was expected, but it’s still sad to see it.

I am very, very glad we have diverse copyright ownership in Drizzle so that this could not happen easily at all.

Speaking at OSCON

OSCON is coming up again: July 25th-29th in wonderful Portland, OR. If you come to OSCON, not only will you be at OSCON, but you’ll be in Portland in July – which is just lovely. We’ll also have other people from Percona there and it should be a great lot of fun.

So come along and talk interesting technology – I’ll be speaking on “Dropping ACID: Eating Data in a Web 2.0 Cloud World“. If you know my love of reliable software with known failure modes, you’ll probably enjoy this one – especially if you saw my “Eat My Data: How everybody gets POSIX file I/O Wrong” which when given at OSCON 2008 was given to a packed room with people in the aisles and out the door.

Speaking has its pluses and minuses of course. It takes a very, very long time to construct a good talk (I would guess 12hrs for each hour of speaking) and you do miss out on some other sessions (preparing, going over, final touches, rehearsals). It is rather rewarding though, and sparks very interesting conversations afterwards.

I should also mention, if you want 20% off OSCON rego: use os11fos as discount code.

HOWTO fix: bzr join error of “Trees have the same root”

From https://answers.launchpad.net/bzr/+question/71563

you can do it from within Python like this:
>>> import bzrlib.workingtree
>>> bzrlib.workingtree.WorkingTree.open(“subdir2”).set_root_id(“tree_root_subdir2”)

Hopefully I can find this easily in the future (have had to use it before)

xtrabackup bazaar repositories upgraded to 2a format

I have just upgraded the xtrabackup bazaar code repositories to the 2a format. This means that bzr 1.16 is required to access the source code repositories now.

If you get an error like the one below when working with a local branch, you’ll need to run “bzr upgrade” in it (see below for example). For branches on launchpad, you can use the web UI and hit the “upgrade branch” button.

stewart@willster:~/src/percona-xtrabackup$ bzr pull
Using saved parent location: bzr+ssh://bazaar.launchpad.net/%2Bbranch/percona-xtrabackup/
Doing on-the-fly conversion from RemoteRepositoryFormat(_network_name='Bazaar repository format 2a (needs bzr 1.16 or later)\n') to RepositoryFormatKnitPack1().
This may take some time. Upgrade the repositories to the same format for better performance.
bzr: ERROR: KnitPackRepository('file:///home/stewart/src/percona-xtrabackup/.bzr/repository/')
is not compatible with
RemoteRepository(bzr+ssh://bazaar.launchpad.net/%2Bbranch/percona-xtrabackup/.bzr/)
different rich-root support
stewart@willster:~/src/percona-xtrabackup$ bzr upgrade
Upgrading branch file:///home/stewart/src/percona-xtrabackup/ ...              
starting upgrade of file:///home/stewart/src/percona-xtrabackup/
making backup of file:///home/stewart/src/percona-xtrabackup/.bzr
  to file:///home/stewart/src/percona-xtrabackup/backup.bzr.~1~
starting repository conversion                                                 
repository converted                                                           
finished

Joining Percona

As you may have read on the MySQL Performance Blog post – I’ve recently joined Percona. This is a fairly exciting next step. I’ll be in New York for Percona Live next week, where I’ll be giving a session titled “Drizzle 7, GA and Supported: Current & Future Features”.

I’ll write more soon, there’s a lot keeping me busy already!

Drizzle JSON interface merged

https://code.launchpad.net/~stewart/drizzle/json-interface/+merge/59859

Currently a very early version of course, but it’s there in trunk if you want to play with it. Just have libcurl and libevent installed and you can submit queries via HTTP and JSON. Of course, the next steps are getting a true non-sql interface going and seeing how people go with it.

Friendly exploits

If you happen to be friends with me on Facebook you will have seen a bunch of rather strange updates from me last night. This all started with a tweet (that was also sent to Facebook) by a friend who joked about doing something with the <MARQUEE> tag (see http://www.angelfire.com/super/badwebs/ for an example of it and similar things). I saw the joke, as I was reading it through Gwibber or the Facebook website. However…. Leah saw text scrolling over the screen… just like the <MARQUEE> tag actually did.

She was looking at it on her iPad using an app called Friendly.

So I immediately posted a status update: “<script lang=”javascript”>alert(“pwned”);</script>”. This is a nice standard little test to see if you’ve managed to inject code into a web site. If this pops up a dialog box, you’ve made it.

It didn’t work. It didn’t display anything… as if it was just not running the script tag. Disappointing. I soooo wanted it to break here.

I did manage to do all sorts of other things in the Live Feed view though. I could use just about any other HTML tag… including forms. I couldn’t get a HTTP request to my server out of a HTML form in the Live Feed view… but once we did manage to crash Friendly (enough that it had to be force quit on the iPad).

I posted a photo of me holding up the iPad to my laptop web cam to show off the basics:

And then one of what happened when I tried a HTML form (this wasn’t reproducible though… so kind of disappointing):

What we did notice however was that HTML tags were parsed in comments on images too…. which made me wonder… It’s pretty easy to make a HTML form button that will do something… so I posted the same image again with a button that would say “Next” but would take you to a web page on one of my servers instead. It worked! I got a HTTP request! Neat! I could then present a HTML page that looked legit and do the standard things that one does to steal off you.

But I wonder if scripts would work…. so I posted:

Photos are proving more exploitable.... <script lang="javascript">alert("pwned");</script>

and then clicked on the image on the iPad……

Gotcha!

I could from here do anything I wanted.

Next… I should probably report this to the developers…. or steal from my friends and make them post things to facebook implying improper relationships and general things that would get you fired.

I went with the former… but the latter would have been fairly easy as the Facebook page for the app nicely tells me which of my friends use it. I could even target my attack!

So I sent a warning message to friends (the 18 of them who use the Friendly app), sent a “contact the developer” message to the developers, sent out a warning on Twitter and went to bed.

Got an email overnight back from the developer: “We just pushed a server update that solves this issue.”

Now… in my tcpdump while trying some of the earlier things I was just seeing https requests to facebook API servers from the iPad, but I don’t thing I looked too closely at images. I have no idea if they’ve actually fixed the holes and I don’t have an iPad to test it on. If you do, go try it.

HTTP JSON AlsoSQL interface to Drizzle

So… I had another one of those “hrrm… this shouldn’t be hard to hack a proof-of-concept” moments. Web apps are increasingly speaking JSON all around the place. Why can’t we speak JSON to/from the database? Why? Seriously, why not?

One reason why MongoDB has found users is that JSON is very familiar to people. It has gained popularity in spite of having pure disregard for the integrity and safety of your data.

So I started with a really simple idea: http server in the database server. Thanks to the simple code to do that with libevent, I got that going fairly quickly. Finding a rather nice C++ library to create and parse JSON was the next challenge. I found JSONcpp, a public domain library with a nice API and proceeded to bring it into the tree (it’s not much code). I then created a simple way to find out the version of the Drizzle server you were speaking to:

$ curl http://localhost:8765/0.1/version
{
   "version" : "2011.04.15.2285"
}

But that wasn’t nearly enough… I also wanted to be able to issue arbitrary queries. Thanks to the supporting code we have in the Drizzle server for EXECUTE() (also used by the replication slave), this was also pretty easy. I created a way to execute the content of a HTTP POST request as if you had done so with EXECUTE() – all nicely wrapped in a transaction.

I created a simple table using the drizzle client, connecting over a normal TCP socket speaking the MySQL protocol and inserted a row in it:

$ ../client/drizzle --port 9306 test
Welcome to the Drizzle client..  Commands end with ; or \g.
Your Drizzle connection id is 4
Connection protocol: mysql
Server version: 2011.04.15.2285 Source distribution (json-interface)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

drizzle> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `a` INT NOT NULL AUTO_INCREMENT,
  `b` VARCHAR(100) COLLATE utf8_general_ci DEFAULT NULL,
  PRIMARY KEY (`a`)
) ENGINE=InnoDB COLLATE = utf8_general_ci
1 row in set (0.001209 sec)

drizzle> insert into t1 (b) values ("from mysql protocol");
Query OK, 1 row affected (0.00207 sec)

Now to select rows from it via HTTP and get a JSON object back with the result set:

$ curl http://localhost:8765/0.1/sql --data 'select * from t1;'
{
   "query" : "select * from t1;",
   "result_set" : [
      [ "1", "from mysql protocol" ],
      [ "", "" ]
   ],
   "sqlstate" : "00000"
}

I can also insert more rows using the HTTP interface and then select them from the MySQL protocol interface:

$ curl http://localhost:8765/0.1/sql --data 'insert into t1 values (NULL, \"from HTTP!\");'
{
   "query" : "insert into t1 values (NULL, \\\"from HTTP!\\\");",
   "sqlstate" : "00000"
}

drizzle> select * from t1;
+---+---------------------+
| a | b                   |
+---+---------------------+
| 1 | from mysql protocol | 
| 2 | from HTTP!          | 
+---+---------------------+
2 rows in set (0.000907 sec)

So what does this get us? With the addition of proper authentication, you could start doing some really quite neat and nifty things. I imagine we could add interfaces to avoid SQL and directly do key lookups, table scans and index range scans, giving really quite sweet performance. We could start building web tools to manage and manipulate the database speaking the native language of the web.

But… there’s more!

Since we have a web server and a way to execute queries via HTTP along with getting the result set as JSON, why can’t we have a simple Web UI for monitoring the database server and running queries built into the database server?

Yes we can.

If you wanted a WHERE condition or anything else, easy. Change the query, hit execute:

No TCP connection or parsing the MySQL protocol or anything. Just HTTP requests straight to the database server from the browser with a bit of client side javascript producing the HTML for the table.

Proof of concept code is up on launchpad in lp:~stewart/drizzle/json-interface

HailDB: A NoSQl API Direct to InnoDB

At the MySQL Conference and Expo last week I gave a session on HailDB. I’ve got the slides up on slideshare so you can either view through them or download them. I think the session went well, and there certainly is some interest in HailDB out there (which is great!).

Speaking on Tuesday: HailDB and Dropping ACID: Eating Data in a Web 2.0 Cloud World

I’m giving two talks tomorrow (Tuesday) at the MySQL Conference and Expo:

HailDB: A NoSQL API direct to InnoDB, 2:00pm, Ballroom D

Dropping ACID: Eating Data In A Web 2.0 Cloud World 3:05pm, Ballroom G

The HailDB talk is all about a C API to embed an InnoDB based relational database engine into your application. Awesome stuff (also nice and technical).

The second talk, “Dropping ACID: Eating Data in a Web 2.0 Cloud World” is not only a joke that only database people get, but a humorous and serious look at data integrity and reliability as promised by the current hype. This was quite well received at linux.conf.au in January. So, if you weren’t in Australia in January this year, then certainly come along and see how you go heckling an Australian.

innodb and memcached

I had a quick look at the source tree (I haven’t compiled it, just read the source – that’s what I do. I challenge any C/C++ compiler to keep up with my brain!) that’s got a tarball up on labs.mysql.com for the memcached interface to innodb. A few quick thoughts:

  • Where’s the Bazaar tree on launchpad? I hate pulling tarballs, following the dev tree is much more interesting from a tech perspective (especially for early development releases). I note that the NDB memcached stuff is up on launchpad now, so yay there. I would love it if the InnoDB team in general was much more open with development, especially with having source trees up on launchpad.
  • It embeds a copy of the memcached server engines branch into the MySQL tree. This is probably the correct way to go. There is no real sense in re-implementing the protocol and network stack (this is about half what memcached is anyway).
  • The copy of the memcached engine branch seems to be a few months old.
  • The current documentation appears to be the source code.
  • The innodb_memcached plugin embeds a memcached server using an API to InnoDB inside the MySQL server process (basically so it can access the same instance of InnoDB as a running MySQL server).
  • There’s a bit of something that kind-of looks similar to the Embedded InnoDB (now HailDB) API being used to link InnoDB and memcached together. I can understand why they didn’t go through the MySQL handler interface… this would be bracing to say the least to get correct. InnoDB APIs, much more likely to have fewer bugs.
  • If this accepted JSON and spat it back out… how fast would MongoDB die? weeks? months?
  • The above dot point would be a lot more likely if adding a column to an InnoDB table didn’t involve epic amounts of IO.
  • I’ve been wanting a good memcached protocol inside Drizzle, we have ,of course, focused on stability of what we do have first. That being said…. upgrade my flight home so I can open a laptop… probably be done well before I land….. (assuming I don’t get to it in the 15 other awesome things I want to hack on this week)