onode unique id versus packing id

along the lines of how reiser chooses to pack things on disk (heuristic that makes numbers of where and how to pack things).

onodes get a unique id.
– to be used in indexing (onode_index primarily, but also higher level indicies)

onodes get a packing id
– onodes with similar packing ids get put on disk together (theoretically). Numbers can overlap. i.e. non-unique. if two onodes share the same packing id, we REALLY want them to be packed together.

Tightly packing onodes

The current problem is that an onode, however much we can pack forks into a block, still takes up a minimum of one disk block. A disk block typically being 4kb, and a tendancy to want to be bigger (think large media files), and also being the unit of atomicity with disk writes.

So, how do we allow multiple onodes per block?
We could take the inode table way of doing things, and just have “an onode is X bytes, and X/block size = Y many onodes per block”, but that does have a lot to be desired – we may want variable sized forks to be stored along side onodes (e.g. what would typically go in an inode).

One way is to split each block into N “sub blocks” or “chunks” (or “insert-cool-name-here”). Basically have a block bitmap with N bits per block, and a chunk size of block size / N. This would allow us to have N onodes per block. Simple to implement (we wouldn’t even have to change the onode_index, as we could do a simple linear search for the onode, even storing them in onode_num order, enabling a binary search). But, if we had a 256kb block size, this means 32k per onode, no matter what. Annoying if the volume has both large media files (where the 256k block size helps), and small files (a unix like operating system or even a Maildir). Volumes are now big, and users like having one big file system on them – so we must be more flexible.

Do we want to (can we?) provide onodes which span blocks? My feeling is no to the latter – as in within the onode struct itself. Having an onode which has forks in other blocks seems like a quite reasonable (and indeed, needed) thing to do. So, we could have lots of onodes tightly packed into a disk block, with the forks being in other blocks. Typically though, you probably want at least one fork packed with the onode, as there aren’t many operations on the onode itself.

So, if onodes (along with some forks) can be a variable size, and we want to pack these into (quite possibly) large blocks, how are we going to do it?

I reckon we can pack them all into one block, with padding where needed (to have it so that no onode crosses atomicity borders) and rely on packing things in a block in a cache friendly manner.