Patching your mission-critical email syncing software on your life setup… my OfflineIMAP patch for today

I’ve used OfflineIMAP for quite a while now. On the whole I’m fairly happy with it. Today I sent this to the list:

Forgive the potentially bad python, not my native tongue :)

This patch is motivated by three things:
- offlineimap is extremely slow at syncing lots of locally deleted
messages
- offlineimap uses lots of memory
- LocalStatus files aren't written safely (a hard crash can cause
corruption)
        - I've been bitten by this in the past, causing a complete resync of
the folder... so I get duplicate messages.

I am currently using 4.0.14 (from Debian) with this patch. I used it to
convert the files and everything. Seems quite reliable and quick.

In my tests, execution time for a normal sync is relatively the same.

Execution time for when lots of messages have been deleted in a
reasonably sized folder (e.g. during re-organisation of mail folders) is
as much as 10x faster.

In my tests, running with 1 thread uses as much as 20% less memory with
this patch (i.e. about 160MB instead of 200MB+ for my maildir)

Disk space used by the LocalStatus files isn't much more... for me it
looks like it's 6.5MB now versus 4.5MB then. We get the added benefit of
indexes for all our queries... nice :)

I had disable the threading for copying messages as this means that
LocalStatus objects are shared between threads, which pysqlite doesn't
like (it asserts).

I think the part of this patch that implements the uidexists does
actually slow things down compared with having the messagelist.... a
more optimal implementation may be possible, but I think the other speed
improvements (and memory savings) are worth it.

A future patch may convert other storage types to sqlite (or similar) to
further reduce memory consumption (and hopefully runtime).

This does add a dependency on pysqlite... which is packaged in debian
(and ubuntu) - and i'm using the stock packages for these.

Comments very much appreciated.
Of course, the patch is here. I’m using it now… although I’ll warn you that it does update your .offlineimap to a new format (and doesn’t provide you a way to go back, without restoring the backed-up LocalStatus files and probably getting message duplicates).

So, those around the MySQL circles I tend to hang around may ask “Why not libmysqld?” (the embedded MySQL server). Well… a few reasons… sqlite is file-per-db (even though I’m essentially using file-per-table here), the python bindings are everywhere (and work), it’s tiny and crash safe.

You may also ask “Why?”… well, I’ve been re-organising a bunch of mail folders, which means deleting a *lot* of messages from some folders (and moving them to others).. offlineimap has been really slow at this. So I fixed it, with code (not whining).

I also wrote a bit-of-a-hack perl script to remove duplicate messages from a bunch of folders (a bug in offlineimap had caused me to get several copies of each message in a bunch of my folders a while ago). So that script is here. Commented out are bits to do comparison via md5 as well as message-id. Don’t use unless you know what you’re doing… it may also use a few hundred MB RAM on large (few hundred thousand messages) folder.

Hopefully these will help improve my productivity.
Now, back to my regular programming….

5 thoughts on “Patching your mission-critical email syncing software on your life setup… my OfflineIMAP patch for today

  1. err… yeah… as has been pointed out to me :)

    I have all sorts of emacs foo to make sure the right modes are set for the right source trees for C/C++ code… just haven’t had to python sanely for…err… years :)

    Turns out I have no useful modes in emacs for editing python sanely.

    Guess this is what happens when the last time you hacked python with much determinism was maybe 4 years ago….

    know a good emacs mode?

  2. Pingback: How I do email (at home) | Ramblings

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.