{"id":228,"date":"2004-04-10T22:21:20","date_gmt":"2004-04-11T03:21:20","guid":{"rendered":"http:\/\/www.flamingspork.com\/blog\/?p=228"},"modified":"2004-04-10T22:21:20","modified_gmt":"2004-04-11T03:21:20","slug":"tightly-packing-onodes","status":"publish","type":"post","link":"https:\/\/www.flamingspork.com\/blog\/2004\/04\/10\/tightly-packing-onodes\/","title":{"rendered":"Tightly packing onodes"},"content":{"rendered":"<p>The current problem is that an onode, however much we can pack forks into a block, still takes up a minimum of one disk block. A disk block typically being 4kb, and a tendancy to want to be bigger (think large media files), and also being the unit of atomicity with disk writes.<\/p>\n<p>So, how do we allow multiple onodes per block?<br \/>\nWe could take the inode table way of doing things, and just have &#8220;an onode is X bytes, and X\/block size = Y many onodes per block&#8221;, but that does have a lot to be desired &#8211; we may want <b>variable<\/b> sized forks to be stored along side onodes (e.g. what would typically go in an inode).<\/p>\n<p>One way is to split each block into N &#8220;sub blocks&#8221; or &#8220;chunks&#8221; (or &#8220;insert-cool-name-here&#8221;). Basically have a block bitmap with N bits per block, and a chunk size of block size \/ N. This would allow us to have N onodes per block. Simple to implement (we wouldn&#8217;t even have to change the onode_index, as we could do a simple linear search for the onode, even storing them in onode_num order, enabling a binary search). But, if we had a 256kb block size, this means 32k per onode, no matter what. Annoying if the volume has both large media files (where the 256k block size helps), and small files (a unix like operating system or even a Maildir). Volumes are now big, and users like having one big file system on them &#8211; so we must be more flexible.<\/p>\n<p>Do we want to (can we?) provide onodes which span blocks? My feeling is no to the latter &#8211; as in within the onode struct itself. Having an onode which has forks in other blocks seems like a quite reasonable (and indeed, needed) thing to do. So, we could have lots of onodes tightly packed into a disk block, with the forks being in other blocks. Typically though, you probably want at least one fork packed with the onode, as there aren&#8217;t many operations on the onode itself.<\/p>\n<p>So, if onodes (along with some forks) can be a variable size, and we want to pack these into (quite possibly) large blocks, how are we going to do it?<\/p>\n<p>I reckon we can pack them all into one block, with padding where needed (to have it so that no onode crosses atomicity borders) and rely on packing things in a block in a cache friendly manner.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The current problem is that an onode, however much we can pack forks into a block, still takes up a minimum of one disk block. A disk block typically being 4kb, and a tendancy to want to be bigger (think &hellip; <a href=\"https:\/\/www.flamingspork.com\/blog\/2004\/04\/10\/tightly-packing-onodes\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[12],"tags":[],"class_list":["post-228","post","type-post","status-publish","format-standard","hentry","category-fcfs"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p5a6n8-3G","jetpack-related-posts":[{"id":514,"url":"https:\/\/www.flamingspork.com\/blog\/2005\/11\/29\/disk-space-allocation-part-3-storing-extents-on-disk\/","url_meta":{"origin":228,"position":0},"title":"disk space allocation (part 3: storing extents on disk)","author":"Stewart Smith","date":"2005-11-29","format":false,"excerpt":"Here I'm going to talk about how file systems store what part of the disk a part of the file occupies. If your database files are very fragmented, performance will suffer. How much depends on a number of things however. XFS can store some extents directly in the inode (see\u2026","rel":"","context":"In &quot;linux-kernel&quot;","block_context":{"text":"linux-kernel","link":"https:\/\/www.flamingspork.com\/blog\/category\/linux-kernel\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":861,"url":"https:\/\/www.flamingspork.com\/blog\/2007\/07\/12\/reading-maildirs-fast\/","url_meta":{"origin":228,"position":1},"title":"reading maildirs&#8230;. fast&#8230;","author":"Stewart Smith","date":"2007-07-12","format":false,"excerpt":"So, for a side project i'm hacking on, i'm wanting to read in Maildirs really fast (and then pump them into something else... for current purposes I'm just putting everything in one file.. getting the read speed up is of current importance). I've done a bit of experimenting and my\u2026","rel":"","context":"In &quot;General&quot;","block_context":{"text":"General","link":"https:\/\/www.flamingspork.com\/blog\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":84,"url":"https:\/\/www.flamingspork.com\/blog\/2003\/03\/31\/variety-in-allocation-block-sizes\/","url_meta":{"origin":228,"position":2},"title":"variety in allocation block sizes?","author":"Stewart Smith","date":"2003-03-31","format":false,"excerpt":"some studies have shown that for multimedia applications, a larger block size improves throughput (e.g. 256kb blocks). For large media files, the waste of an average 128kb per file is insignificant (over several megabytes to many hundred mb or indeed GB). But, for smaller files (typically occupied by configuration files\u2026","rel":"","context":"In &quot;hons-project&quot;","block_context":{"text":"hons-project","link":"https:\/\/www.flamingspork.com\/blog\/category\/hons-project\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":785,"url":"https:\/\/www.flamingspork.com\/blog\/2007\/01\/29\/larger-inodes-make-for-some-happy-apps\/","url_meta":{"origin":228,"position":3},"title":"Larger inodes make for (some) happy apps","author":"Stewart Smith","date":"2007-01-29","format":false,"excerpt":"Mikal talks about Ted talking about Tridge talking about how larger inodes can improve samba4 performance. Well, not just Samba4. Beagle and SELinux are also common heaver users of extended attributes which can often be stored inside the inode (e.g. on XFS). There used to be the case where the\u2026","rel":"","context":"In &quot;linux-kernel&quot;","block_context":{"text":"linux-kernel","link":"https:\/\/www.flamingspork.com\/blog\/category\/linux-kernel\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":511,"url":"https:\/\/www.flamingspork.com\/blog\/2005\/11\/23\/disk-space-allocation-part-1-seeing-whats-happenned\/","url_meta":{"origin":228,"position":4},"title":"disk space allocation (part 1: seeing what&#8217;s happenned)","author":"Stewart Smith","date":"2005-11-23","format":false,"excerpt":"(a little while ago I was writing a really long entry on everything possible. I realised that this would be a long read for people and that less people would look at it, so I've split it up). This sprung out of doing work on the NDB disk data tree.\u2026","rel":"","context":"In &quot;linux-kernel&quot;","block_context":{"text":"linux-kernel","link":"https:\/\/www.flamingspork.com\/blog\/category\/linux-kernel\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":147,"url":"https:\/\/www.flamingspork.com\/blog\/2003\/08\/21\/block-allocation-for-transactions-and-incomplete-snapshots\/","url_meta":{"origin":228,"position":5},"title":"block allocation for transactions and incomplete snapshots","author":"Stewart Smith","date":"2003-08-21","format":false,"excerpt":"be able to mark blocks as \"in transaction\" and only have this info recorded in memory, not on disk. allows less writes to disk, as any uncommitted transaction we don't care about on restart. but, when things get to the \"we're ready to commit this\" stage, we're going to have\u2026","rel":"","context":"In &quot;hons-project&quot;","block_context":{"text":"hons-project","link":"https:\/\/www.flamingspork.com\/blog\/category\/hons-project\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/posts\/228","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/comments?post=228"}],"version-history":[{"count":0,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/posts\/228\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/media?parent=228"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/categories?post=228"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/tags?post=228"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}