bzr-loom – a bzr plugin with quilt like functionality

A bzr plugin to assist in developing focused patches. in Launchpad

I use quilt a lot for development. Currently, If I had to choose between BK and quilt – I’d choose quilt.

I use bzr in other development projects like MemberDB. I use git as a frontend for SVN (it is *so* much faster than the svn client and incredibly more space efficient… A copy of the entire history of a tree stored in git is usually less than a single svn checkout). I also use darcs (and quilt) for offlineimap and just about every other revision control tool at some point.

So this is a bit of a discussion about how I work and how bzr-loom would help it… (I’ve wished for a long time that bk had stuff like this… bk collapse is just not what I want, although others use it lots).

The loom plugin to bzr looks like a fantasy world of goodness where the revision control system has some knowledge of these work in progress patches. The ability to push and pull looms around the place seems awfully nice.

What’s even more awsome is that you can push your set of patches up to a normal bzr branch and they become normal commits! i.e. you get rid of the whole “convert quilt patches into changesets” pain and just push.

Revision tracking them (so you can see what you’ve changed in your patch set) is also nice (I have thought about keeping patches/ in a bzr repo for this purpose). So I can now get a history of my patchset against various mainline versions.

One of the big advantages of quilt is speed – it’s lightning fast (basically being a diff and patch wrapper) . Hopefully bzr looms continue in this fine tradition (and I wish other systems would get something like it too)

MemberDB speed improvements

So I finally installed the xdebug PHP extension and started doing some performance analysis of MemberDB using xdebug and kcachegrind. The upshot of which is a number of commits to the bzr tree that dramatically improve performance in several key areas. The answer? Caching.

I’m not even talking using memcached or caching things in database tables or anything like that – just about everything is still the same dynamically produced content as before, but I’m now caching some simple things avoiding many round-trips to the database while executing a script.

There were a few things that were taking a fair bit of execution time:

  1. The generation of the menu. In MemberDB, there’s a menu on the left. There’s also a powerful (read: non-trivial) permissions system allowing relatively fine grained granting of permissions. So, we need to check that the user has permission to go to the page before showing the page in the menu.
    Previously, for each item in the menu, we’d do a lookup to the database – checking if they have the permission or they are an admin. This ended up taking a bit of time – up to 30% of the time for the front page was taken up just generating the menu!
    So, now I cache the set of permissions for the user. One function to fetch it from the DB into a structure, another function to check the permissions of the user in that struct.
    While testing this, I actually used memcached to cache the menu to see how much of an improvement I could get… I’m about 69/70ths of the speed of using memcached with a purely PHP implementation caching the permissions info.
  2. Getting the information about a member is done in a variety of places. On some pages, you want information on the current logged in user (or just need to find their member ID). These are now cached for the duration of the script. Saved quite a few DB round trips
  3. When viewing an election (not the results, just the normal “view election” page that lists candidates), we need to get the membership information on a number of users (okay… so technically I should rewrite some of the queries to use joins in the DB… but this was easier). I now have a (limited) cache of membership info. So now, when a member has nominated multiple people, we only pull the member info out of the database once.
  4. Rewrite the “current_members” view. The old one was not as efficient as it could be. While the new one has slightly different semantics (can have duplicate rows, it turns out the use of DISTINCT was adding a bit of execution time, which for a bunch of queries is not needed) it’s significantly quicker.

I used the faithful Apache Bench (ab) to do benchmarks against the modified PHP code. I think the biggest improvement was the view election page which went from about 6seconds/page to 0.2seconds/page.

Storing Passwords (securly) in MySQL

Frank talks about Storing Passwords in MySQL. He does, however, miss something that’s really, really important. I’m talking about the salting of passwords.

If I want to find out what  5d41402abc4b2a76b9719d911017c592 or 015f28b9df1bdd36427dd976fb73b29d MD5s mean, the first thing I’m going to try is a dictionary attack (especially if i’ve seen a table with only user and password columns). Guess what? A list of words and their MD5SUMS can be used to very quickly find what these hashes represent.

I’ll probably have this dictionary in a MySQL database with an index as well. Try it yourself – you’ll probably find a dictionary with the words “hello” and “fire” in it to help. In fact, do this:

mysql> create table words (word varchar(100));
Query OK, 0 rows affected (0.13 sec)
mysql> load data local infile ‘/usr/share/dict/words’ into table words;
Query OK, 98326 rows affected (0.85 sec)
Records: 98326  Deleted: 0  Skipped: 0  Warnings: 0

mysql> alter table words add column md5hash char(32);
Query OK, 98326 rows affected (0.39 sec)
Records: 98326  Duplicates: 0  Warnings: 0

mysql> update words set md5hash=md5(word);
Query OK, 98326 rows affected (3.19 sec)
Rows matched: 98326  Changed: 98326  Warnings: 0
mysql> alter table words add index md5_idx (md5hash);
Query OK, 98326 rows affected (2.86 sec)
Records: 98326  Duplicates: 0  Warnings: 0
mysql> select * from words where md5hash=’5d41402abc4b2a76b9719d911017c592′;
+——-+———————————-+
| word  | md5hash                          |
+——-+———————————-+
| hello | 5d41402abc4b2a76b9719d911017c592 |
+——-+———————————-+
1 row in set (0.11 sec)
mysql> select * from words where md5hash=’015f28b9df1bdd36427dd976fb73b29d’;
+——+———————————-+
| word | md5hash                          |
+——+———————————-+
| fire | 015f28b9df1bdd36427dd976fb73b29d |
+——+———————————-+
1 row in set (0.00 sec)
$EXCLAMATION I hear you go.

Yes, this is not a good way to “secure” passwords. Oddly enough, people have known about this for a long time and there’s a real easy  solution. It’s called salting.

Salting is prepending a random string to the start of the password when you store it (and when you check it).

So, let’s look at how our new password table may look:

mysql> select * from passwords;
+——+——–+———————————-+
| user | salt   | md5pass                          |
+——+——–+———————————-+
| u1   | ntuk24 | ce6ac665c753714cb3df2aa525943a12 |
| u2   | drc,3  | 7f573abbb9e086ccc4a85d8b66731ac8 |
+——+——–+———————————-+
2 rows in set (0.00 sec)
As you can see, the MD5s are different than before. If we search these up in our dictionary, we won’t find a match.

mysql> select * from words where md5hash=’ce6ac665c753714cb3df2aa525943a12′;
Empty set (0.01 sec)

instead, we’d have to get the salt and do an md5 of the salt and the dictionary word and see if the md5 matches. Guess what, no index for that! and with all the possible values for salt, we’ve substantially increased the problem space to construct a dictionary (i won’t go into the maths here).

mysql> create view v as select word, md5(CONCAT(‘ntuk24′,word)) as salted from words;
Query OK, 0 rows affected (0.05 sec)

mysql> select * from v where salted=’ce6ac665c753714cb3df2aa525943a12’;
+——-+———————————-+
| word  | salted                           |
+——-+———————————-+
| hello | ce6ac665c753714cb3df2aa525943a12 |
+——-+———————————-+
1 row in set (2.04 sec)

mysql> create or replace view v as select word, md5(CONCAT(‘drc,3′,word)) as salted from words;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from v where salted=’7f573abbb9e086ccc4a85d8b66731ac8’; +——+———————————-+
| word | salted                           |
+——+———————————-+
| fire | 7f573abbb9e086ccc4a85d8b66731ac8 |
+——+———————————-+
1 row in set (2.12 sec)

So we’ve gone from essentially instantaneous retreival, to now taking about 2 seconds. Even if I assume that one of your users is going to be stupid enough to have a dictionary password, It’s going to take me 2 seconds to check each user – as the salt is different for each user! So it could take me hours just to find that user. Think about how many users are in your user table – with 1000 users, it’s over 1/2hr. For larger systems, it’s going to be hours.

MemberDB – A Membership Database 0.4

MemberDB – A Membership Database

I released 0.4 today. A rather long awaited release. No doubt there’ll be bugs and the need for a 0.4.1 or something – but this is relatively bug free and has a bunch of new cool stuff to chew on.

It’s also the first real release to support MySQL 5.0 (previous releases won’t work as MemberDB heavily uses views).

must be time to use the OSDC conference registration/paper submission site

it’s annoying. grr.

but, on the other hand, I am speaking about MySQL 5.0 at OSDC.

This is even cooler as 5.0 has gone GA. So it’s not “upcoming features” it’s the “here and now”.

I’ll now have to release MemberDB 0.4 (the MySQL release). Converting the Linux Australia installation over at some point soon too. The 0.4 tree fixes enough bugs that it’s worth it (one of which Pia found the other day).

MemberDB election-results performance on new laptop

So I picked up my new laptop on friday. It’s an ASUS V6V – nice and fast, light, good resolution screen and lots of disk and RAM (it came with 1GB, I’ve got 2GB).

Anyway, the transfer of data from my PowerBook went fine. I waited for xfsdump to dump /home from the powerbook to a firewire drive (and for “waiting” I do mean going out and seeing Charlie and the Chocolate Factory – which was very good).

Installing Ubuntu on the ASUS went like a dream. Everything, and i do mean everything worked out-of-the-box with only one tweak. That was uncommented the ACPI sleep configuration option do-dad in /etc/default/acpi-something-foo to get suspend to ram working.

The WEP didn’t work in the installer, so I initially just used the GigE adapter until the first reboot.

The firewire drivers don’t really behave with this laptop atm… that dreaded “aborted sbp2 command” error too often – so abandoned that and futzed around with a private net and NFS to xfsrestore /home.

Go to bed, awake later to find /home on new laptop (with an extra 23GB of free space!). I had to, of course – rebuild those essential packages for x86 instead of ppc – namely wesnoth.

oh, and cleaning out the ppc binaries from my mysql bk trees and doing a x86 build (I also had to change my CC from ‘ccache distcc powerpc-linux-gcc’ to ‘ccache distcc i386-linux-gcc’). One thing is for certain, it’s quicker at building things – even if the fan ramps up a bit when doing so :)

MySQL builds pretty quickly when you have a 2.8Ghz P4 and a 2.13Ghz Pentium M building it.

Anyway, set up all the apache foo for hacking on the LA website and MemberDB today. A load of the elections-result page on digital (the LA server – dual PIII 1.133Ghz) takes about fourteen or fifteen seconds using PostgreSQL as the database.

I previously reported that using MySQL (InnoDB tables) I got about twice the performance on my old laptop (1Ghz G4).

Well, on this one (2.13Ghz Pentium M) I’m getting the page loading in under three seconds. Sweet. Maybe I won’t go ahead and try to optimise some of the queries :)

(the query cache is probably coming into this – but i did do the query several times – so it’s not as if there’s any unfair advantage anywhere).

I’m using the 5.0.12-max-beta gcc dynamic build as downloaded from mysql.com for these runs. All other packages (apache2, php) are as shipped in Ubuntu. The my.ini file is as-shipped (err.. i think so: no query log, no binlog, slow query log enabled and some paths changed)

Election results page performance

Did the switch of the election-results page from postgresql to mysql today. It’s about twice as fast (crappy statistics there, but it’s an approximation).

This is using InnoDB tables.

With prepared statements we should be able to increase performance even further. I enabled the query log for a load of the page – we’re doing about 3,500 queries. Hrrm…. not so good. But using prepared statements should give us maybe another decent boost (a fair bit IIRC the parser overhead)

MySQL 5.0.10 released!

The next beta of 5.0 is out – 5.0.10. So go ahead and download it. You’re not cool unless you download it (you can find it here http://dev.mysql.com/downloads/mysql/5.0.html)

File swap it, torrent it, burn it to cds and give it to friends. Don’t you just love free software!

This release fixes a bunch of bugs that were holding back MemberDB from being usable on MySQL. This was bugs in new features, and the main one had already been reported.

There’s also been a few fixes for cluster which is always a good thing. Nothing show stopping though. We must be good :)

Only 18 bugs for MemberDB 0.4!

Yes “only” 18….

although the “make installation procedure not suck” has to be the most important.

I’m very tempted to branch and make a 0.3.1 release the “no, it really works this time” release. mainly because there were still a few annoying bugs (being female could cause you trouble if you messed with the edit-member page. It would store it okay, just always display it wrong (which meant that it *looked* like it wasn’t recording you as female).

Hopefully I’ve stopped the javascript error box popping up on some platforms too.

MySQL port of MemberDB

Spent probably about 3 hours today porting the database schema to MySQL 5 along with finding some bugs in the process. Pretty minor ones, mainly to do with how things could be improved to improve compatibility with schemas written with postgresql in mind.

While chasing up some stuff on why the serial type alias in mysql wasn’t exactly the same as postgresql serial type (which is an integer with a sequence and default value) i found this gem in the postgresql docs:

Note: Prior to PostgreSQL 7.3, serial implied UNIQUE. This is no longer automatic. If you wish a serial column to be in a unique constraint or a primary key, it must now be specified, same as with any other data type.

Great huh? So upgrade postgresql and don’t go sifting through your tables (now come on, everybody uses a serial/auto_increment field in a lot of tables) you loose!

i.e. there’s bugs in memberdb now that weren’t there when i started and i didn’t change any code to make them. hrrm…

anyway, i’ll write at some time the few easy steps it took to get the schema across (it takes no time once you know what you’re doing – like a few commands and a few search and replaces).

for now, you can get stuff from arch: stewart@flamingspork.com–memberdb/memberdb–mysql–0.4

the schema loads, i’ll have to change one bit of code to make it all work – otherwise everything should be fine (but let me test first – or provide fixes, not complaints :)

branched stable–0.3

no, it’s not released – but i made a branch that should be a relatively stable version of 0.3.

it has none of the election work (it’s not finished) – and in hindsight, maybe it should have gone into a seperate branch… but oh well.

been doing some security fixes as well… and updating the installed copy on digital as i do.

Additional code into voting

you can now accept nominations and put down your spiel.

At some point I’m going to have to go through a bunch of the code and do a security audit. We should be fine with what we’re running now, but the head of 0.3 most likely isn’t. All that will be problematic is insertion of crap into the outputted HTML.

There’s also been a number of improvements recently in the forms infrastructure which the older parts of the UI could benefit from.

Not that speed is a current problem, but a bit of caching could really help some things along – esp with get_member_id, get_member and has_permission. We really don’t need to go back to the database multiple times in the same page display. Temporary variables can just be messy too – much better to keep the code clean.

making a nominee a candidate

added a column to election. you now have a “number of required nominations”.

in a normal election, this will be two. One person nominates, the other seconds.

When there are enough nominations, the person can accept it, and then an entry is created in election_candidate.

at least this will be the case… soon.

About to go and write the accept nomination page thingy.