{"id":861,"date":"2007-07-12T18:13:53","date_gmt":"2007-07-12T08:13:53","guid":{"rendered":"http:\/\/www.flamingspork.com\/blog\/2007\/07\/12\/reading-maildirs-fast\/"},"modified":"2007-07-12T18:13:53","modified_gmt":"2007-07-12T08:13:53","slug":"reading-maildirs-fast","status":"publish","type":"post","link":"https:\/\/www.flamingspork.com\/blog\/2007\/07\/12\/reading-maildirs-fast\/","title":{"rendered":"reading maildirs&#8230;. fast&#8230;"},"content":{"rendered":"<p>So, for a side project i&#8217;m hacking on, i&#8217;m wanting to read in Maildirs really fast (and then pump them into something else&#8230; for current purposes I&#8217;m just putting everything in one file.. getting the read speed up is of current importance).<\/p>\n<p>I&#8217;ve done a bit of experimenting and my current method (which seems to be as fast as any):<\/p>\n<ol>\n<li>read the directory (cur)<\/li>\n<li>sort by inode number<\/li>\n<li>foreach 1000 inodes:\n<ol>\n<li>sort by start block number<\/li>\n<li>read message<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<p>This makes a couple of assumptions:<\/p>\n<ul>\n<li>sequential inode numbers are close to each other on disk (making stat(2) cheaper)<\/li>\n<li>mail messages are small&#8230; likely to be in 1 extent, so start block is a good metric for locality.<\/li>\n<\/ul>\n<p>Oh, some of this is specific to XFS&#8230; which is what I care about (and it turns out you don&#8217;t need to be root to get an extents list for a file on XFS).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>So, for a side project i&#8217;m hacking on, i&#8217;m wanting to read in Maildirs really fast (and then pump them into something else&#8230; for current purposes I&#8217;m just putting everything in one file.. getting the read speed up is of &hellip; <a href=\"https:\/\/www.flamingspork.com\/blog\/2007\/07\/12\/reading-maildirs-fast\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[1],"tags":[],"class_list":["post-861","post","type-post","status-publish","format-standard","hentry","category-general"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p5a6n8-dT","jetpack-related-posts":[{"id":785,"url":"https:\/\/www.flamingspork.com\/blog\/2007\/01\/29\/larger-inodes-make-for-some-happy-apps\/","url_meta":{"origin":861,"position":0},"title":"Larger inodes make for (some) happy apps","author":"Stewart Smith","date":"2007-01-29","format":false,"excerpt":"Mikal talks about Ted talking about Tridge talking about how larger inodes can improve samba4 performance. Well, not just Samba4. Beagle and SELinux are also common heaver users of extended attributes which can often be stored inside the inode (e.g. on XFS). There used to be the case where the\u2026","rel":"","context":"In &quot;linux-kernel&quot;","block_context":{"text":"linux-kernel","link":"https:\/\/www.flamingspork.com\/blog\/category\/linux-kernel\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":514,"url":"https:\/\/www.flamingspork.com\/blog\/2005\/11\/29\/disk-space-allocation-part-3-storing-extents-on-disk\/","url_meta":{"origin":861,"position":1},"title":"disk space allocation (part 3: storing extents on disk)","author":"Stewart Smith","date":"2005-11-29","format":false,"excerpt":"Here I'm going to talk about how file systems store what part of the disk a part of the file occupies. If your database files are very fragmented, performance will suffer. How much depends on a number of things however. XFS can store some extents directly in the inode (see\u2026","rel":"","context":"In &quot;linux-kernel&quot;","block_context":{"text":"linux-kernel","link":"https:\/\/www.flamingspork.com\/blog\/category\/linux-kernel\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1627,"url":"https:\/\/www.flamingspork.com\/blog\/2009\/05\/09\/does-linux-fallocate-zero-fill\/","url_meta":{"origin":861,"position":2},"title":"Does linux fallocate() zero-fill?","author":"Stewart Smith","date":"2009-05-09","format":false,"excerpt":"In an email disscussion for pre-allocating binlogs for MySQL (something we'll likely have to do for Drizzle and replication), Yoshinori brought up the excellent point of that in some situations you don't want to be doing zero-fill as getting up and running quickly is the most important thing. So what\u2026","rel":"","context":"In &quot;code&quot;","block_context":{"text":"code","link":"https:\/\/www.flamingspork.com\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":89,"url":"https:\/\/www.flamingspork.com\/blog\/2003\/04\/23\/xfs-and-other-cool-things\/","url_meta":{"origin":861,"position":3},"title":"XFS and other cool things","author":"Stewart Smith","date":"2003-04-23","format":false,"excerpt":"Been re-reading a lot of the XFS papers that are on the SGI website (http:\/\/oss.sgi.com\/projects\/xfs\/) and thinking more about what I want out of an object store. There are a lot of similar design goals (I think) yet some very different ways of implementing things. Having a large B+Tree full\u2026","rel":"","context":"In &quot;hons-project&quot;","block_context":{"text":"hons-project","link":"https:\/\/www.flamingspork.com\/blog\/category\/hons-project\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":705,"url":"https:\/\/www.flamingspork.com\/blog\/2006\/05\/31\/ha_file\/","url_meta":{"origin":861,"position":4},"title":"ha_file","author":"Stewart Smith","date":"2006-05-31","format":false,"excerpt":"In what I laughingly call \"spare time\" I started hacking on ha_file.cc, otherwise known as the FILE storage engine. My idea is relatively simple, I want to be able to store and access my photos from MySQL. I also want the storage to be relatively efficient and have the raw\u2026","rel":"","context":"In &quot;mysql&quot;","block_context":{"text":"mysql","link":"https:\/\/www.flamingspork.com\/blog\/category\/work-et-al\/mysql\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":761,"url":"https:\/\/www.flamingspork.com\/blog\/2006\/11\/22\/create-insert-select-drop-benchmark\/","url_meta":{"origin":861,"position":5},"title":"CREATE, INSERT, SELECT, DROP benchmark","author":"Stewart Smith","date":"2006-11-22","format":false,"excerpt":"Inspired by PeterZ's Opening Tables scalability post, I decided to try a little benchmark. This benchmark involved the following: Create 50,000 tables CREATE TABLE t{$i} (i int primary key) Insert one row into each table select * from each table drop each table I wanted to test file system impact\u2026","rel":"","context":"In &quot;General&quot;","block_context":{"text":"General","link":"https:\/\/www.flamingspork.com\/blog\/category\/general\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/posts\/861","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/comments?post=861"}],"version-history":[{"count":0,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/posts\/861\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/media?parent=861"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/categories?post=861"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/tags?post=861"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}