The ARCHIVE Storage Engine

I wonder how much longer the ARCHIVE storage engine is going to ship with MySQL…. I think I’m the last person to actually fix a bug in it, and that was, well, a good number of years ago now. It was created to solve a simple problem: write once read hardly ever. Useful for logs and the like. A zlib stream of rows in a file.

You can actually easily beat ARCHIVE for INSERT speed with a non-indexed MyISAM table, and with things like TokuDB around you can probably get pretty close to compression while at the same time having these things known as “indexes”.

ARCHIVE for a long time held this niche though and was widely and quietly used (and likely still is). It has the great benefit of being fairly lightweight – it’s only about 2500 lines of code (1130 if you exclude azio.c, the slightly modified gzio.c from zlib).

It also use the table discovery mechanism that NDB uses. If you remove the FRM file for an ARCHIVE table, the ARCHIVE storage engine will extract the copy it keeps to replace it. You can also do consistent backups with ARCHIVE as it’s an append-only engine. The ARCHIVE engine was certainly the simplest example code of this and a few other storage engine API things.

I’d love to see someone compare storage space and performance of ARCHIVE against TokuDB and InnoDB (hint hint, the Internet should solve this for me).

13 thoughts on “The ARCHIVE Storage Engine”

It looks like it is supported. Looking at the 5.6 release notes I see a fix for partitioning with ARCHIVE in a recent release, PERFORMANCE_SCHEMA instrumentation, and even bugfixing: http://bugs.mysql.com/bug.php?id=37280.

I still use it for archiving massive amounts of data and it’s great. I don’t think a transactional engine would be a suitable replacement because of its native overhead (transaction log, flush etc). But I’ll let whoever takes on the task to actually prove it. :)

That partitioning bug got marked private as it could crash the server, so it wasn’t too visible (it was visible to me as i was the reporter …)

I’m wondering if its possible to modify the archive table code to make it create the rows as separate gzip records and add a lookup table of some kind to allow fast random access reads of compressed records by id?

or perhaps an option to return the byte offsets of where a record starts in the file so people can put those in another (small) table (and a way to get a single record by its offset)?
-so it could then read a record in a large compressed table without needing to uncompress and scan the whole table.

Well, it’s software – so of course you can! You’d likely be better to compress groups of records into hunks and “index” based on that – so you only have to decompress a 64MB (for example) chunk rather than the whole table when you need a row in that chunk. But, the tricky bit is working out an efficient indexing scheme for it. Honestly, TokuDB is probably going to function better for your use case than writing a custom variant of ARCHIVE.

ecaron on 2013-05-14 at 10:59 am said:

RT @stewartsmith: The ARCHIVE Storage Engine: I wonder how much longer the ARCHIVE storage engine is going to shi… http://t.co/fjBvKwuKzJ
sarfraznawaz on 2013-05-14 at 11:48 am said:

New #mysql planet post : The ARCHIVE Storage Engine http://t.co/nzf165jpWH
Laurynas Biveinis on 2013-05-14 at 1:13 pm said:

It looks like it is supported. Looking at the 5.6 release notes I see a fix for partitioning with ARCHIVE in a recent release, PERFORMANCE_SCHEMA instrumentation, and even bugfixing: http://bugs.mysql.com/bug.php?id=37280.
Laurynas Biveinis on 2013-05-14 at 1:40 pm said:

Laurynas Biveinis liked this on Facebook.
Valerii Kravchuk on 2013-05-14 at 3:40 pm said:

Valerii Kravchuk liked this on Facebook.
Justin Swanhart on 2013-05-14 at 8:40 pm said:

Ask and you shall receive. Now I go to bed.

http://swanhart.livejournal.com/136493.html
tmcallaghan on 2013-05-14 at 9:01 pm said:

How much longer will the archive storage engine for @mysql continue to exist?
Ovi on 2013-05-14 at 9:24 pm said:

I still use it for archiving massive amounts of data and it’s great. I don’t think a transactional engine would be a suitable replacement because of its native overhead (transaction log, flush etc). But I’ll let whoever takes on the task to actually prove it. :)
hartmut on 2013-05-14 at 10:53 pm said:

That partitioning bug got marked private as it could crash the server, so it wasn’t too visible (it was visible to me as i was the reporter …)
tmcallaghan on 2013-05-14 at 11:31 pm said:

How much longer will the archive storage engine for @mysql continue to exist? http://t.co/wBDzUqUtgM
eRadical on 2013-05-15 at 2:23 am said:

RT @tmcallaghan: How much longer will the archive storage engine for @mysql continue to exist? http://t.co/wBDzUqUtgM
Michael on 2014-04-27 at 9:25 pm said:

I’m wondering if its possible to modify the archive table code to make it create the rows as separate gzip records and add a lookup table of some kind to allow fast random access reads of compressed records by id?

or perhaps an option to return the byte offsets of where a record starts in the file so people can put those in another (small) table (and a way to get a single record by its offset)?
-so it could then read a record in a large compressed table without needing to uncompress and scan the whole table.
Stewart Smith on 2014-04-28 at 8:37 am said:

Well, it’s software – so of course you can! You’d likely be better to compress groups of records into hunks and “index” based on that – so you only have to decompress a 64MB (for example) chunk rather than the whole table when you need a row in that chunk. But, the tricky bit is working out an efficient indexing scheme for it. Honestly, TokuDB is probably going to function better for your use case than writing a custom variant of ARCHIVE.

Ramblings

Ramblings which occasionally resemble reality. This is the blog of Stewart Smith.

The ARCHIVE Storage Engine

Like this:

Related

13 thoughts on “The ARCHIVE Storage Engine”

Leave a ReplyCancel reply

Share this:

Like this:

Related

13 thoughts on “The ARCHIVE Storage Engine”

Leave a ReplyCancel reply