The ARCHIVE Storage Engine

I wonder how much longer the ARCHIVE storage engine is going to ship with MySQL…. I think I’m the last person to actually fix a bug in it, and that was, well, a good number of years ago now. It was created to solve a simple problem: write once read hardly ever. Useful for logs and the like. A zlib stream of rows in a file.

You can actually easily beat ARCHIVE for INSERT speed with a non-indexed MyISAM table, and with things like TokuDB around you can probably get pretty close to compression while at the same time having these things known as “indexes”.

ARCHIVE for a long time held this niche though and was widely and quietly used (and likely still is). It has the great benefit of being fairly lightweight – it’s only about 2500 lines of code (1130 if you exclude azio.c, the slightly modified gzio.c from zlib).

It also use the table discovery mechanism that NDB uses. If you remove the FRM file for an ARCHIVE table, the ARCHIVE storage engine will extract the copy it keeps to replace it. You can also do consistent backups with ARCHIVE as it’s an append-only engine. The ARCHIVE engine was certainly the simplest example code of this and a few other storage engine API things.

I’d love to see someone compare storage space and performance of ARCHIVE against TokuDB and InnoDB (hint hint, the Internet should solve this for me).

13 thoughts on “The ARCHIVE Storage Engine

  1. I still use it for archiving massive amounts of data and it’s great. I don’t think a transactional engine would be a suitable replacement because of its native overhead (transaction log, flush etc). But I’ll let whoever takes on the task to actually prove it. :)

  2. That partitioning bug got marked private as it could crash the server, so it wasn’t too visible (it was visible to me as i was the reporter …)

  3. I’m wondering if its possible to modify the archive table code to make it create the rows as separate gzip records and add a lookup table of some kind to allow fast random access reads of compressed records by id?

    or perhaps an option to return the byte offsets of where a record starts in the file so people can put those in another (small) table (and a way to get a single record by its offset)?
    -so it could then read a record in a large compressed table without needing to uncompress and scan the whole table.

  4. Well, it’s software – so of course you can! You’d likely be better to compress groups of records into hunks and “index” based on that – so you only have to decompress a 64MB (for example) chunk rather than the whole table when you need a row in that chunk. But, the tricky bit is working out an efficient indexing scheme for it. Honestly, TokuDB is probably going to function better for your use case than writing a custom variant of ARCHIVE.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.