Archive for the ‘linux-kernel’ Category

Does linux fallocate() zero-fill?

Saturday, May 9th, 2009

In an email disscussion for pre-allocating binlogs for MySQL (something we’ll likely have to do for Drizzle and replication), Yoshinori brought up the excellent point of that in some situations you don’t want to be doing zero-fill as getting up and running quickly is the most important thing.

So what does Linux do? Does it zero-fill, or behave sensibly and pre-allocate quickly?

Let’s look at hte kernel:

Inside the fallocate implementation (fs/open.c):

if (inode->i_op->fallocate)
ret = inode->i_op->fallocate(inode, mode, offset, len);
else
ret = -EOPNOTSUPP;

and for ext4:
/*
* currently supporting (pre)allocate mode for extent-based
* files _only_
*/
if (!(EXT4_I(inode)->i_flags & EXT4_EXTENTS_FL))
return -EOPNOTSUPP;

XFS has always done fast pre-allocate, so it’s not a problem there and the only other filesystems to currently support fallocate are btrfs and ocfs2 – which we don’t even have to start worrying too much about yet :)

But this is just kernel behaviour – i *think* libc ends up wrapping it
with a ENOTSUPP from kernel being “let me zero-fill” (which might be
useful to check). Anybody want to check libc for me?

This was all on slightly post 2.6.30-rc3 (from git: 8c9ed899b44c19e81859fbb0e9d659fe2f8630fc)

default filesystem and disk parameters are for wusses

Friday, March 28th, 2008

I can’t remember the last time i used default mkfs or mount options… oh yeah, that’s right – by accident.

Anyway… I did a little experiment today.

The filesystem is my laptop /home – XFS, 100GB, 95% used (so 5-6GB free), rather aged. This is where a lot of my MySQL development is done. Mkfs options: 128MB log, version2 log. Mount options: logbufs=8, logbsize=256k. All of this geared towards increasing metadata performance.

Why metadata performance? well… source code trees are a lot of metadata :)

So, let’s try some things: cloning a repository and then removing the repository.

Two variables are being tested: mounting the file system with nobarrier (or barrier, the default). Write barriers tell the disk to ensure write order to the platter when write cache is in use. Also testing disabling (or enabling, the default) the disk write cache.

cloneperf1.png

rmperf1.png

NOTE: the last option which has the write cache enabled and write barriers disabled is NOT SAFE. If your machine crashes, you loose data, and potentially your file system ends up corrupted.

So I’m now disabling my disk write cache and mounting with nobarrier.

If you use real disk arrays – e.g. battery backed write cache RAID boxes, the story is likely very different!

gah… O_DIRECT…. 2.4…. non xfs… stab stab

Tuesday, July 31st, 2007

* dchinner hands MacPlusG3 a bigger knife….

(on #xfs yesterday)

Larger inodes make for (some) happy apps

Monday, January 29th, 2007

Mikal talks about Ted talking about Tridge talking about how larger inodes can improve samba4 performance. Well, not just Samba4. Beagle and SELinux are also common heaver users of extended attributes which can often be stored inside the inode (e.g. on XFS).

There used to be the case where the Fedora installer would run mkfs.xfs with the default options and enable SELinux. Turns out this setup is great for systems without SELinux but the xattrs were large enough to require more space than what could fit in the inode, causing an extra block per inode to be allocated for xattrs. Not exactly space efficient.

The same can happen with Beagle. So if you’re using SELinux and/or Beagle and/or Samba4 – large inodes are probably going to be a winner for you.

I think we’re getting to the time where xattrs are popping up here there and everywhere for all sorts of applications and we’re going to have to find good (and efficient) ways of storing them.

I’m increasingly warming up to the idea of variable sized inodes. For a FS like XFS this could be done per group of inodes (XFS typically will, when more inodes are needed, create 64 inodes at once). File systems such as ext3 don’t really have this option as there is an inode table that is fixed and created at mkfs time. Although Ted has some interesting ideas for ext4 in this regard.

I’m sure Val Henson would have some interesting ideas for ChunkFS too…  a very interesting concept that I’ve been thinking about the possibility of retrofitting into existing systems (which I don’t think is that silly).

It would be great for some of the XFS dudes to write about the parallelising of checking an XFS file system.

CREATE, INSERT, SELECT, DROP benchmark

Wednesday, November 22nd, 2006

Inspired by PeterZ’s Opening Tables scalability post, I decided to try a little benchmark. This benchmark involved the following:

  • Create 50,000 tables
  • CREATE TABLE t{$i} (i int primary key)
  • Insert one row into each table
  • select * from each table
  • drop each table
  • I wanted to test file system impact on this benchmark. So, I created a new LVM volume, 10GB in size. I extracted a ‘make bin-dist’ of a recent MySQL 5.1 tree, did a “mysql-test-run.pl –start-and-exit” and ran my script, timing real time with time.

    For a default ext3 file system creating MyISAM tables, the test took 15min 8sec.

    For a default xfs file sytem creating MyISAM tables, the test took 7min 20sec.

    For an XFS file system with a 100MB Version 2 log creating MyISAM tables, the test took 7min 32sec – which is within repeatability of the default XFS file system. So log size and version made no real difference.

    For a default reiserfs (v3) file system creating MyISAM tables, the test took 9m 44sec.

    For a ext3 file system with the dir_index option enabled creating MyISAM tables, the test took 14min 21sec.

    For an approximate measure of the CREATE performance…. ext3 and reiserfs averaged about 100 tables/second (although after the 20,000 mark, reiserfs seemed to speed up a little). XFS  averaged about 333 tables/second. I credit this to the check for if the files exist being performed by a b-tree lookup in XFS once the directory reached a certain size.

    Interestingly, DROPPING the tables was amazingly fast on ext3 – about 2500/sec. XFS about 1000/sec. So ext3 can destroy easier than it can create while XFS keeps up to speed with itself.

    What about InnoDB tables? Well…

    ext3(default): 21m 11s

    xfs(default): 12m 48s

    ext3(dir_index): 21m 11s

    Interestingly the create rate for XFS was around 500 tables/second – half that of MyISAM tables.

    These are interesting results for those who use a lot of temporary tables or do lots of create/drop tables as part of daily life.

    All tests performed on a Western Digital 250GB 7200rpm drive in a 2.8Ghz 800Mhz FSB P4 with  2GB memory running Ubuntu 6.10 with HT enabled.

    At the end of the test, the ibdata1 file had grown to a little over 800MB – still enough to fit in memory. If we increased this to maybe 200,000 tables (presumably about a 3.2GB file) that wouldn’t fit in cache, then the extents of XFS would probably make it perform better when doing INSERT and SELECT queries as opposed to the list of blocks that ext3 uses. This is because the Linux kernel caches the mapping of in memory block to disk block lookup making the efficiency of this in the file system irrelevant for data sets less than memory size.

    So go tell your friends: XFS is still the coolest kid on the block.

    Disk allocation, XFS, NDB Disk Data and more…

    Monday, November 13th, 2006

    I’ve talked about disk space allocation previously, mainly revolving around XFS (namely because it’s what I use, a sensible choice for large file systems and large files and has a nice suite of tools for digging into what’s going on).Most people write software that just calls write(2) (or libc things like fwrite or fprintf) to do file IO – including space allocation. Probably 99% of file io is fine to do like this and the allocators for your file system get it mostly right (some more right than others). Remember, disk seeks are really really expensive so the less you have to do, the better (i.e. fragmentation==bad).

    I recently (finally) wrote my patch to use the xfsctl to get better allocation for NDB disk data files (datafiles and undofiles).
    patch at:
    http://lists.mysql.com/commits/15088

    This actually ends up giving us a rather nice speed boost in some of the test suite runs.

    The problem is:
    - two cluster nodes on 1 host (in the case of the mysql-test-run script)
    - each node has a complete copy of the database
    - ALTER TABLESPACE ADD DATAFILE / ALTER LOGFILEGROUP ADD UNDOFILE creates files on *both* nodes. We want to zero these out.
    - files are opened with O_SYNC (IIRC)

    The patch I committed uses XFS_IOC_RESVSP64 to allocate (unwritten) extents and then posix_fallocate to zero out the file (the glibc implementation of this call just writes zeros out).

    Now, ideally it would be beneficial (and probably faster) to have XFS do this in kernel. Asynchronously would be pretty cool too.. but hey :)

    The reason we don’t want unwritten extents is that NDB has some realtime properties, and futzing about with extents and the like in the FS during transactions isn’t such a good idea.

    So, this would lead me to try XFS_IOC_ALLOCSP64 – which doesn’t have the “unwritten extents” warning that RESVSP64 does. However, with the two processes writing the files out, I get heavy fragmentation. Even with a RESVSP followed by ALLOCSP I get the same result.

    So it seems that ALLOCSP re-allocates extents (even if it doesn’t have to) and really doesn’t give you much (didn’t do too much timing to see if it was any quicker).

    I’ve asked if this is expected behaviour on the XFS list… we’ll see what the response is (i haven’t had time yet to go read the code… i should though).

    So what improvement does this patch make? well, i’ll quote my commit comments:

    BUG#24143 Heavy file fragmentation with multiple ndbd on single fs
    
    If we have the XFS headers (at build time) we can use XFS specific ioctls
    (once testing the file is on XFS) to better allocate space.
    
    This dramatically improves performance of mysql-test-run cases as well:
    
    e.g.
    number of extents for ndb_dd_basic tablespaces and log files
    BEFORE this patch: 57, 13, 212, 95, 17, 113
    WITH this patch  :  ALL 1 or 2 extents
    
    (results are consistent over multiple runs. BEFORE always has several files
    with lots of extents).
    
    As for timing of test run:
    BEFORE
    ndb_dd_basic                   [ pass ]         107727
    real    3m2.683s
    user    0m1.360s
    sys     0m1.192s
    
    AFTER
    ndb_dd_basic                   [ pass ]          70060
    real    2m30.822s
    user    0m1.220s
    sys     0m1.404s
    
    (results are again consistent over various runs)
    
    similar for other tests (BEFORE and AFTER):
    ndb_dd_alter                   [ pass ]         245360
    ndb_dd_alter                   [ pass ]         211632

    So what about the patch? It’s actually really tiny:

    
    --- 1.388/configure.in	2006-11-01 23:25:56 +11:00
    +++ 1.389/configure.in	2006-11-10 01:08:33 +11:00
    @@ -697,6 +697,8 @@
    sys/ioctl.h malloc.h sys/malloc.h sys/ipc.h sys/shm.h linux/config.h \
    sys/resource.h sys/param.h)
    
    +AC_CHECK_HEADERS([xfs/xfs.h])
    +
     #--------------------------------------------------------------------
    # Check for system libraries. Adds the library to $LIBS
    # and defines HAVE_LIBM etc
    
    --- 1.36/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp	2006-11-03 02:18:41 +11:00
    +++ 1.37/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp	2006-11-10 01:08:33 +11:00
    @@ -18,6 +18,10 @@
    #include
    #include
    
    +#ifdef HAVE_XFS_XFS_H
    +#include
    +#endif
    +
     #include "AsyncFile.hpp"
    
    #include
    @@ -459,6 +463,18 @@
    Uint32 index = 0;
    Uint32 block = refToBlock(request->theUserReference);
    
    +#ifdef HAVE_XFS_XFS_H
    +    if(platform_test_xfs_fd(theFd))
    +    {
    +      ndbout_c("Using xfsctl(XFS_IOC_RESVSP64) to allocate disk space");
    +      xfs_flock64_t fl;
    +      fl.l_whence= 0;
    +      fl.l_start= 0;
    +      fl.l_len= (off64_t)sz;
    +      if(xfsctl(NULL, theFd, XFS_IOC_RESVSP64, &fl) < 0)
    +        ndbout_c("failed to optimally allocate disk space");
    +    }
    +#endif
     #ifdef HAVE_POSIX_FALLOCATE
    posix_fallocate(theFd, 0, sz);
    #endif

    So get building your MySQL Cluster with the XFS headers installed and run on XFS for sweet, sweet disk allocation.

    Twinhan USB DTV dongle not working :(

    Tuesday, September 5th, 2006

    so after doing some researching (read: using search engines with linux + product name), I came to the conclusion that a Twinhan USB2.0 DVB dongle would be the dongle for me. Yes – it’s small, compact and does digital tv without requiring a non-existant free PCI slot in my Shuttle MythTV box.

    Having had great success with my last bit of new hardware (a really cheap Logitech QuickCam Express or something) – plug it in and it “just works”. Oh Linux how you are better than Microsoft Windows for hardware usability!

    But this was not to be. It uses a vp7045 chipset, which has drivers both in Ubuntu 6.06 “Dapper” and in the latest v4l-dvb hg tree.

    But for the life of me I couldn’t get it to tune into any TV stations (for those of you who like using hardware and not just having expensive boxes around, you will appreciate how tuning into a TV station is rather important functionality for a TV card). So I started having a look around the interweb for possible answers.

    The best I could come up with was “are you sure you have all the cables plugged in” – yes, I was.

    So seeing as this is the first digital TV dongle in this house, I wondered if the signal just wasn’t getting here. I got a friend to bring around a spare digital set top box. It worked fine. Brilliantly in fact – it even worked with the shitty small antenna that came with the dongle. So it wasn’t an ability to receive.

    I then came across this post to the linux-dvb list titled “New VP7045 with TDA10046 instead of MT352 (was: VP7045 tuner doesn’t work)”. Which really does hint at the problem!

    I could be one of the lucky ones with a new revision that uses the TDA10046 instead of the MT352! (after getting some debug info from the card out of the driver – it was reporting itself as v1.02, so quite possible).

    Maybe time to hack the dvb driver for it? Things seem pretty modular, so it couldn’t be too hard, right?

    Well, the vp7045-fe.c file is the front end (well, what it assumes is the front end) for the vp7045.c dongle. So all I really need to do is to get it to use the tda10046 frontend (under frontends/tda1004x.c) instead of the vp7045-fe.c fe code.

    Well, it seems as though the tda10046 is an i2c device while the vp7045-fe isn’t. Hrrm… I’ve never really done much with i2c, so this’ll be fun!

    I’ve currently managed to hack the driver so that we do some things to do with the tda chip – although i haven’t gotten in detecting the i2c adapter – which means we’re never going to get a front end! (in fact, when you plug in the device with my modified driver you get a “no frontend detected” message from the kernel).

    i’ve tried poking on the #linuxtv channel on freenode to no avail – so it seems like i’m on my own for a bit.

    A good way to spend midnight until 3am though :)

    I’ll probably end up doing the same tonight. Why? Because it’s just so much fun.

    Oh, and if anybody has any pointers – it would be appreciated.

    I am, of course, assuming the hardware itself isn’t faulty. I have no MS Windows system around to test on.

    Arjen’s MySQL Community Journal – HyperThreading? Not on a MySQL server…

    Wednesday, June 14th, 2006

    Arjen’s MySQL Community Journal – HyperThreading? Not on a MySQL server…

    I blame the Linux Process Scheduler. At least it’s better than the earlier 2.6 days where things would get shunted a lot from one “cpu” to the other “cpu” for no real reason.

    Newer kernel verisons are probably better… but don’t even think of HT and pre-2.6 – that would be funny.

    DaveM on Ingo’s SMP lock validator

    Wednesday, May 31st, 2006

    DaveM talks about Ingo’s new SMP lock validator for linux kernel

    A note reminding me to go take a look and see what can be ripped out and placed into various bits of MySQL and NDB. Ideally, of course, it could be turned into a LD_PRELOAD for pthread mutexes.

    Anybody who wants to look deeper into it before I wake up again is welcome to (and tell me what they find)

    Beat on “state of the dolphin” (or: Why Software is never really ready until a .20 release)

    Wednesday, April 19th, 2006

    Beat Vontobel blogs about “fuþark: The silence of futhark and the state of the dolphin” which is basically about how he’s found that the 5.0.20 release of MySQL is when the 5.0 release is really starting to shine.

    This confirms my theory (that I’ve had for quite a while now… like years) that a software release is never really mature until it hits about .20 (that’s dot twenty, not dot two).

    When something reaches .10 (dot ten) it’s no longer going to be annoying for most uses, but .20 means that you’re going to be happy. Don’t ask me really why this is the case, but it is.

    Think about the 2.6 kernel (yes, Linux Kernel – honestly, you think i was talking about something else?). At about 2.6.10, it would no longer be a pain to use and get things going – everything was starting to be smooth. As we’re getting closer to .20, things are getting better too. Mind you, everything here does run 2.6 now (and so does my mum’s machine – which is always a good sign of something being ready). With 2.4 hitting .20 – you’d never even think about using 2.2, 2.4 was perfect (except when you wanted 2.6).

    GNOME (and everything attached to it) is getting to be a really good desktop – ever since about the 2.10 release I’ve been using just much more of the GNOMEy way of doing things because they’re actually getting useful and usable (don’t get me wrong, previous releases were good too – but a lot more things annoyed me). As the releases have progressed, I’m increasingly convinced that 2.20 will be the “we’re here” release. 2.14 is a lot better, but there’s still a bunch of stuff that has to be done before it’s totally kick-ass.

    There are no surprises in MySQL 4.0 (it’s past .20 – at .26 now). Everybody knows and trusts it. 4.1 is at 4.1.18 – which is about as good as a .20 and it’s a pretty happy release. But due to 4.0 being rather solid – a lot of people have just stuck there. We’re seeing a bunch move to 5.0 – but my theory is that this will be 5.0.20 or above. Hrrm… anybody see a pattern?

    MySQL 5.1 is at 5.1.10 (or so) and it’s stopped being annoying, and that great march towards a .20 is healthy and active.

    GCC 2.95 had a lot of respect for a very long time (now it’s just a bit old). Note that .95 is higher than .20 :)

    EMACS is at version 21, but ed is only at .2 (hrrm.. and which is used by more people as their editor i wonder).

    aptitude at 0.2.15 (getting to .20) – while apt is at 0.6.40 (above .20). RPM is only at 4.0.4 – so a bit to go there :)

    The version of postgresql is 7.5.9 over here… so getting to the .1 stage, but away from the .20. (now I’m going to watch comments fill up with postgesql guys going on about something, i just know it :) But there is 7.3.14 – a lot closer to .20!

    MythTV is at 0.19 – getting closer to the .20 release (it’s a lot better than even just a few releases ago).

    (versions here mostly taken from whatever ubuntu 5.04 has)

    Note that attempting to skip a whole bunch of versions and label your software 95, 98, 2003 or whatever doesn’t get you “.20″ status. Neither does just skipping to “.20″ automatically. It’s about hard work and removing annoying things (we tend to call them bugs).

    This is a really stupid metric of software maturity. It is, however, disturbingly accurate.

    really unstable laptop

    Tuesday, March 14th, 2006

    I’m currently getting hard crashes about five times a day.

    I thought it was the sound driver, as i got a crash during dist-upgrade (again) while on console and saw the backtrace. Basically looked like something bad happenned when the sound was muted.

    So, running without sound muted – just turned down.

    Well, today, just crashed again. Since running X, no backtrace. ARRRGHHH.

    Also crashed when waking up too. ACPI stuff in the backtrace.

    Not a happy camper at the moment. I have work to do, not futzing around with trying to find out what the fuck is wrong with my laptop (probably software) when I should be running a stable system.

    I’ve already have to re-add all my liferea RSS feeds as liferea obviously isn’t doing the right thing (at least the version shipping with Ubuntu) regards writing the feeds file to disk.

    So, I’m trying to prepare presentations for our DevConf on an incredibly buggy and almost unusable OpenOffice.org on an unstable laptop.

    I think I’m going to have wine again with lunch.

    Microsoft’s file system patent upheld: ZDNet Australia: News: Software

    Wednesday, January 11th, 2006

    Microsoft’s file system patent upheld: ZDNet Australia: News: Software

    Saying any part of the FAT file system is “novel and non-obvious” is rather like saying being stabbed in the eye with a fork is “novel and a good way to spend a sunday afternoon”.

    Seriously – what the?

    I’m really glad I work for a company that opposes software patents.

    Thanks to Pia for the links.

    disk space allocation (part 4: allocating an extent)

    Tuesday, November 29th, 2005

    For XFS, in normal operation, an extent is only allocated when data has to be written to disk. This is called delayed allocation. If we are extending a file by 50MB – that space is deducted from the total free space on the filesystem, but no decision on where to place that data is made until we start writing it out – due to memory pressure or the kernel automatically starts writing the dirty pages out (the sync once every 5 seconds on linux).

    When an extent needs to be allocated, XFS looks it up in one of two b+trees it has of free space. There is one sorted by starting block number (so you can search for “an extent near here”) and one by size (so you can search for “an extent of x size”).

    The ideal situation being that you want as large an extent as possible as close to the tail end of the file as possible (i.e. just making the current extent bigger).

    The worst-case scenario is having to allocate extents to multiple files at once with all of them being written out synchronously (O_SYNC or memory pressure) as this will cause lots of small extents to be created.

    disk space allocation (part 3: storing extents on disk)

    Tuesday, November 29th, 2005

    Here I’m going to talk about how file systems store what part of the disk a part of the file occupies. If your database files are very fragmented, performance will suffer. How much depends on a number of things however.

    XFS can store some extents directly in the inode (see xfs_dinode.h). If I’m reading things correctly, this can be 2 extents per fork (data fork and attribute fork). If more than this number of extents are needed, a btree is used instead.

    HFS/HFS+ can store up to 8 extents directly in the catalog file entry (see Apple TechNote 1150 – which was updated in March 2004 with information on the journal format). If the file has more than 8 extents, a lookup then needs to be done into the extents overflow file. Interestingly enough, in MacOS X 10.4 and above (i think it was 10.4… may have been 10.3 as well) if a file is less than 20MB and has more than 8 extents, on an open, the OS will automatically try to defragment that file. Arguably you should just fix your allocation strategy, but hey – maybe this does actually help.

    File systems such as ext2, ext3 and reiserfs just store a list of block numbers. In the case of ext2 and ext3, the futher into a file you are, the more steps are required to find the disk block number associated with that block in the file.

    So what does an extent actually look like? Well, for XFS, the following excerpt from xfs_bmap_btree.h is interesting:

    #define ISUNWRITTEN(x) ((x)->br_state == XFS_EXT_UNWRITTEN)

    typedef struct xfs_bmbt_irec
    {
    xfs_fileoff_t br_startoff; /* starting file offset */
    xfs_fsblock_t br_startblock; /* starting block number */
    xfs_filblks_t br_blockcount; /* number of blocks */
    xfs_exntst_t br_state; /* extent state */
    } xfs_bmbt_irec_t;

    It’s also rather self explanetry. Holes (for sparse files) in XFS don’t have extents, and an extent doesn’t have to have been written to disk. This allows you to preallocate space in chunks without having written anything to it. Reading from an unwritten extent gets you zeros (otherwise it would be a security hole!).

    disk space allocation (part 2: examining your database files)

    Wednesday, November 23rd, 2005
    memberdb/log.MYD:
     EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
       0: [0..943]:        5898248..5899191  3 (36536..37479)     944
       1: [944..1023]:     6071640..6071719  3 (209928..210007)    80
       2: [1024..1127]:    6093664..6093767  3 (231952..232055)   104
       3: [1128..1279]:    6074800..6074951  3 (213088..213239)   152
       4: [1280..1407]:    6074672..6074799  3 (212960..213087)   128
       5: [1408..1423]:    6074264..6074279  3 (212552..212567)    16
    memberdb/log.MYI:
     EXT: FILE-OFFSET      BLOCK-RANGE        AG AG-OFFSET        TOTAL
       0: [0..7]:          10165832..10165839  5 (396312..396319)     8
    

    The interesting thing about this is that the log table grows very slowly. This table stores a bunch of debugging output for my memberdb applicaiton. It should possibly be a partitioned ARCHIVE table (and probably will in the future).

    The thing about a file growing slowly over time is that it’s more likely to have more than 1 extent (I’ll examine why in the near future).

    My InnoDB data and log files only have 1 extent.. I think I’ve done a xfs_fsr on my file system though.

    disk space allocation (part 1: seeing what’s happenned)

    Wednesday, November 23rd, 2005

    (a little while ago I was writing a really long entry on everything possible. I realised that this would be a long read for people and that less people would look at it, so I’ve split it up).

    This sprung out of doing work on the NDB disk data tree. Anything where efficient use of the filesystem is concerned tickles my fancy, so I went to have a look at what was going on.

    Filesystems store what part of the disk belongs to what file in one of two ways. The first is to keep a list of every disk block (typically 4kb) that’s being used by the file. A 400kb file will have 100 block numbers. The second way is to store a range (extent). That is, a 400kb file could use 100 blocks starting at disk block number 1000.

    XFS has a tool called xfs_bmap. It gives you a list of the extents allocated to a file.

    So, let’s have a look at what it tells us about some recordings on my MythTV box.

    myth@orpheus:~$ ls -lah myth-recordings/10_20050912183000_20050912190000.nuv
     -rw-r--r--  1 myth myth 452M 2005-09-12 19:00 myth-recordings/10_20050912183000_20050912190000.nuv
    myth@orpheus:~$ xfs_bmap -v myth-recordings/10_20050912183000_20050912190000.nuv
    myth-recordings/10_20050912183000_20050912190000.nuv:
     EXT: FILE-OFFSET       BLOCK-RANGE          AG AG-OFFSET             TOTAL
       0: [0..639]:         228712176..228712815  7 (21106232..21106871)    640
       1: [640..1663]:      83674040..83675063    2 (24358056..24359079)   1024
       2: [1664..923519]:   83675368..84597223    2 (24359384..25281239) 921856
       3: [923520..924031]: 84631272..84631783    2 (25315288..25315799)    512
    

    Just to make things fun, this is all in 512byte blocks. But anyway, the real interesting thing is the number of extents. Ideally, every file would have one extent as this means that we avoid disk seeks – *the* most expensive disk operation.

    XFS also provides the xfs_fsr tool (File System Repacker) that can defragment files (even on a mounted file system). On IRIX this used to run out of cron – fun when a bunch of machines hit a CXFS volume all at the same time.

    LKML: Linus Torvalds: Re: [OT]Linus trademarks Linux?!!

    Wednesday, August 24th, 2005

    LKML: Linus Torvalds: Re: [OT]Linus trademarks Linux?!!

    thoughts on the trademark and notes about slashdot being a big public wanking session (which is, if nothing else – quite accurrate and quite funny)

    An old year-2000 mail about the same stuff

    log based file system

    Friday, May 20th, 2005

    I think this can be done – with gaurenteed consistency – fairly efficiently.

    would love to do some experiments and see what performance i could get.

    write performance could be spectacular…

    there’s some ideas floating in my head for read performance optimisation – i wonder if any of them make any sense.

    Feature: No More Free BitKeeper

    Wednesday, April 6th, 2005

    Feature: No More Free BitKeeper

    Insert inspired-by-RMS rant about non-free software owning you.

    I don’t know what the implications of this is going to be… but something worth reading and thinking about.

    RT2500 wireless PCI card on Ubuntu

    Friday, January 14th, 2005

    Got the two cards today. Ordered from i-Tech (mob in Sydney, had it delivered here). Were $59AUD each (plus shipping, which was $15 for the two of them).

    Really painless setup!

    One was for the Ubuntu system my mum uses, the other for the Windows system my brother uses. Well, the Ubuntu setup was easier than the windows one (try to get Windows to tell you the MAC address of the adapter… well… *of course* it’s under “Support” – where else would it be?).

    I got the drivers from CVS from http://rt2x00.serialmonkey.com as the CVS ones have a few more fixes (makes it easier to build for one).

    I got the following packages:
    build-essential
    cvs
    linux-source-(whatever version it was).

    cd /usr/src
    tar xfj linux-source-whatever.tar.bz2
    ln -s /lib/modules/the-right-version-number/build /usr/src/linux-whatever
    cd /usr/src/linux-whatever
    cp /boot/config-whatever .config
    make modules

    (as long as it builds the first few you’re fine and can ctrl-c the rest)

    then i got the CVS drivers and built it like their docs say (make with the -C parameters).

    depmod -a

    then used the GUI tool to set it up (the Ubuntu one). The ralink graphical utility (install the kde-devel package to build it) lets you monitor link quality etc.

    so, success!