Microsoft’s file system patent upheld: ZDNet Australia: News: Software

Microsoft’s file system patent upheld: ZDNet Australia: News: Software

Saying any part of the FAT file system is “novel and non-obvious” is rather like saying being stabbed in the eye with a fork is “novel and a good way to spend a sunday afternoon”.

Seriously – what the?

I’m really glad I work for a company that opposes software patents.

Thanks to Pia for the links.

disk space allocation (part 4: allocating an extent)

For XFS, in normal operation, an extent is only allocated when data has to be written to disk. This is called delayed allocation. If we are extending a file by 50MB – that space is deducted from the total free space on the filesystem, but no decision on where to place that data is made until we start writing it out – due to memory pressure or the kernel automatically starts writing the dirty pages out (the sync once every 5 seconds on linux).

When an extent needs to be allocated, XFS looks it up in one of two b+trees it has of free space. There is one sorted by starting block number (so you can search for “an extent near here”) and one by size (so you can search for “an extent of x size”).

The ideal situation being that you want as large an extent as possible as close to the tail end of the file as possible (i.e. just making the current extent bigger).

The worst-case scenario is having to allocate extents to multiple files at once with all of them being written out synchronously (O_SYNC or memory pressure) as this will cause lots of small extents to be created.

disk space allocation (part 3: storing extents on disk)

Here I’m going to talk about how file systems store what part of the disk a part of the file occupies. If your database files are very fragmented, performance will suffer. How much depends on a number of things however.

XFS can store some extents directly in the inode (see xfs_dinode.h). If I’m reading things correctly, this can be 2 extents per fork (data fork and attribute fork). If more than this number of extents are needed, a btree is used instead.

HFS/HFS+ can store up to 8 extents directly in the catalog file entry (see Apple TechNote 1150 – which was updated in March 2004 with information on the journal format). If the file has more than 8 extents, a lookup then needs to be done into the extents overflow file. Interestingly enough, in MacOS X 10.4 and above (i think it was 10.4… may have been 10.3 as well) if a file is less than 20MB and has more than 8 extents, on an open, the OS will automatically try to defragment that file. Arguably you should just fix your allocation strategy, but hey – maybe this does actually help.

File systems such as ext2, ext3 and reiserfs just store a list of block numbers. In the case of ext2 and ext3, the futher into a file you are, the more steps are required to find the disk block number associated with that block in the file.

So what does an extent actually look like? Well, for XFS, the following excerpt from xfs_bmap_btree.h is interesting:

#define ISUNWRITTEN(x) ((x)->br_state == XFS_EXT_UNWRITTEN)

typedef struct xfs_bmbt_irec
{
xfs_fileoff_t br_startoff; /* starting file offset */
xfs_fsblock_t br_startblock; /* starting block number */
xfs_filblks_t br_blockcount; /* number of blocks */
xfs_exntst_t br_state; /* extent state */
} xfs_bmbt_irec_t;

It’s also rather self explanetry. Holes (for sparse files) in XFS don’t have extents, and an extent doesn’t have to have been written to disk. This allows you to preallocate space in chunks without having written anything to it. Reading from an unwritten extent gets you zeros (otherwise it would be a security hole!).

disk space allocation (part 2: examining your database files)

memberdb/log.MYD:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..943]:        5898248..5899191  3 (36536..37479)     944
   1: [944..1023]:     6071640..6071719  3 (209928..210007)    80
   2: [1024..1127]:    6093664..6093767  3 (231952..232055)   104
   3: [1128..1279]:    6074800..6074951  3 (213088..213239)   152
   4: [1280..1407]:    6074672..6074799  3 (212960..213087)   128
   5: [1408..1423]:    6074264..6074279  3 (212552..212567)    16
memberdb/log.MYI:
 EXT: FILE-OFFSET      BLOCK-RANGE        AG AG-OFFSET        TOTAL
   0: [0..7]:          10165832..10165839  5 (396312..396319)     8

The interesting thing about this is that the log table grows very slowly. This table stores a bunch of debugging output for my memberdb applicaiton. It should possibly be a partitioned ARCHIVE table (and probably will in the future).

The thing about a file growing slowly over time is that it’s more likely to have more than 1 extent (I’ll examine why in the near future).

My InnoDB data and log files only have 1 extent.. I think I’ve done a xfs_fsr on my file system though.

disk space allocation (part 1: seeing what’s happenned)

(a little while ago I was writing a really long entry on everything possible. I realised that this would be a long read for people and that less people would look at it, so I’ve split it up).

This sprung out of doing work on the NDB disk data tree. Anything where efficient use of the filesystem is concerned tickles my fancy, so I went to have a look at what was going on.

Filesystems store what part of the disk belongs to what file in one of two ways. The first is to keep a list of every disk block (typically 4kb) that’s being used by the file. A 400kb file will have 100 block numbers. The second way is to store a range (extent). That is, a 400kb file could use 100 blocks starting at disk block number 1000.

XFS has a tool called xfs_bmap. It gives you a list of the extents allocated to a file.

So, let’s have a look at what it tells us about some recordings on my MythTV box.

myth@orpheus:~$ ls -lah myth-recordings/10_20050912183000_20050912190000.nuv
 -rw-r--r--  1 myth myth 452M 2005-09-12 19:00 myth-recordings/10_20050912183000_20050912190000.nuv
myth@orpheus:~$ xfs_bmap -v myth-recordings/10_20050912183000_20050912190000.nuv
myth-recordings/10_20050912183000_20050912190000.nuv:
 EXT: FILE-OFFSET       BLOCK-RANGE          AG AG-OFFSET             TOTAL
   0: [0..639]:         228712176..228712815  7 (21106232..21106871)    640
   1: [640..1663]:      83674040..83675063    2 (24358056..24359079)   1024
   2: [1664..923519]:   83675368..84597223    2 (24359384..25281239) 921856
   3: [923520..924031]: 84631272..84631783    2 (25315288..25315799)    512

Just to make things fun, this is all in 512byte blocks. But anyway, the real interesting thing is the number of extents. Ideally, every file would have one extent as this means that we avoid disk seeks – *the* most expensive disk operation.

XFS also provides the xfs_fsr tool (File System Repacker) that can defragment files (even on a mounted file system). On IRIX this used to run out of cron – fun when a bunch of machines hit a CXFS volume all at the same time.

log based file system

I think this can be done – with gaurenteed consistency – fairly efficiently.

would love to do some experiments and see what performance i could get.

write performance could be spectacular…

there’s some ideas floating in my head for read performance optimisation – i wonder if any of them make any sense.

RT2500 wireless PCI card on Ubuntu

Got the two cards today. Ordered from i-Tech (mob in Sydney, had it delivered here). Were $59AUD each (plus shipping, which was $15 for the two of them).

Really painless setup!

One was for the Ubuntu system my mum uses, the other for the Windows system my brother uses. Well, the Ubuntu setup was easier than the windows one (try to get Windows to tell you the MAC address of the adapter… well… *of course* it’s under “Support” – where else would it be?).

I got the drivers from CVS from http://rt2x00.serialmonkey.com as the CVS ones have a few more fixes (makes it easier to build for one).

I got the following packages:
build-essential
cvs
linux-source-(whatever version it was).

cd /usr/src
tar xfj linux-source-whatever.tar.bz2
ln -s /lib/modules/the-right-version-number/build /usr/src/linux-whatever
cd /usr/src/linux-whatever
cp /boot/config-whatever .config
make modules

(as long as it builds the first few you’re fine and can ctrl-c the rest)

then i got the CVS drivers and built it like their docs say (make with the -C parameters).

depmod -a

then used the GUI tool to set it up (the Ubuntu one). The ralink graphical utility (install the kde-devel package to build it) lets you monitor link quality etc.

so, success!

Free Software Wish List

This has been gathering in my brain, I figure I should write it all down:

X

  • Render everything using Composite and OpenGL
    basically then we can have output that doesn’t suck! Translucency is not only cool, but useful in some UI.
  • All 2D graphics to be drawn with Cairo.
    Enough said here – vector is the future.
  • A magnify screen function (look at MacOS X’s) except using Cairo et all so that everything is still smooth when you zoom (use those vector graphics baby, yeah!)
  • Graphics cards companies to pull their finger out and do full open source drivers.

GNOME (note that this is only long because I love it so much and spend so much time using it)

  • To be able to set emblems on files/folders in Nautilus via the contextual menu.
  • To have the Create Archive option in the contextual menu have a submenu with options such as “.tar, .tar.gz, .tar.bz2, .zip” (or just .gz, .bz2 if only a single file is selected)
  • Take less time to log in
  • Evolution to not leak memory.
  • Evolution to handle big maildirs better (where big is the multiple hundred of thousands of messages
  • For Evolution to not do “checking” stuff on mailboxes.
  • Gaim getting it’s contact list from Evolution
  • Nautilus having better graphics for open versus closed folders (at least in the theme I use – Industrial)
  • The applications menu to be faster
  • Get rid of the Window List and Virtual Desktop – they are broken UI elements. Windows 95 proved that the taskbar just doesn’t scale when you have enough memory to run more than one application. Maybe a NeXT style dock would be good? I don’t have the answer here
  • Dashboard and Beagle to become easily installable and usable. If there’s issues with shipping mono apps as part of core gnome, then lets rewrite them in something that isn’t mono. I want that functionality!
  • gThumb to become good – think iPhoto on steroids with links into Gimp. Also, some sane way to store metadata
  • Multisync to work properly with evo2
  • Multisync to sync photos (and their metadata)
  • GnuCash to be GTK2
  • xchat to get some HIG UI love
  • All settings that a user could care about to be in a user-visible folder and able to be easily backed up (e.g. by dragging to a blank CD with nautilus-cd-burner). i.e. put everything in a folder called “Settings” instead of buried around in dotfiles.
  • A good backup utility that my mother can use (that’s smart enough to split things over multiple CDs if needed)
  • GUI for ACLs
  • Open With to be file specific as well as global
  • Animation with UI events (drool at OSX’s effects, then make better ones)
  • A good RAD dev environment. Something involving Glade and Python and integrated. Think Visual Basic 3 (when it was actually good) but on steroids with GNOME love.
  • Gnome Time Tracker to scale better (and not corrupt it’s own data files – i.e. use rename and sync properly)
  • Rhythmbox to have iPod (or any MP3/Ogg player) integration. I want to plug my ipod in, and see it in both Nautilus and Rhythmbox (and have a big Sync button in RB, as well as being able to drag files to it).
  • The desktop background to be Xinerama aware and know not to stretch an image over both screens. let me set one for each screen!
  • Have “random” desktop backgrounds from a folder
  • a desktop background option to better “fill” the screen on widescreens (where the image isn’t widescreen)

General Utils

  • xfsdump to get DVD support (multi volume dumps directly to DVD)
  • g++ to be faster and use less than a squigabyte of memory
  • prism54 to have proper link monitoring (with the gnome panel applet)
  • GUI version of kismet

there’s more… i just can’t be bothered writing any more at the moment :)

reiser4

I’m trying to go through and understand parts of the reiser4 code – specifically where and how blocks are freed.

I thought that maybe as a little introductry thing I could try and implement a “secure delete” option – i.e. one that basicall just writes rubbish over a block just before freeing it.

I have a feeling i’m going to have to ask some questions on the reiserfs-list :)

linux-2.4.22-ben2-stew1-xfs available

Okay, I admit that the -stew1 part is pure vanity – but hey, that’s what EXTRAVERSION is for, right?

A patch against stock 2.4.22 which gives you:
– benh fixes and features
– a fix for that annoying message on console when changing brightness (brightness down seems to generate a newline for me still, but it’s better than annoying text)
– XFS file system from SGI.

I have .debs if people are interested, can upload.

patch-2.4.22-ben2-stew1-xfs.bz2

Other useful things to note:
– Firewire now works for me, I’ve even burnt a BeOS CD!
– A newer version of the HFS+ driver is out, head on over to
http://www.ardistech.com/hfsplus/
to grab a copy. See Roman’s recent LKML post for details.

as always, feedback welcome.

enjoy :)

potential fix for keyboard scancode errors

on my ibook2 (500mhz), while on console and changing brightness, i get a “keyboard: unknown scancode e0 4c” (or e0 54 if going up brightness) log message. Changing the brightness still works though….

This patch may help…..

--- linux-2.4.21-ben2-xfs.recent/drivers/macintosh/mac_hid.c    2002-08-03 10:39:44.000000000 +1000
+++ linux-2.4.21-ben2-xfs.recent-stew1/drivers/macintosh/mac_hid.c      2003-09-09 00:20:13.000000000 +1000
@@ -207,8 +207,8 @@
        KEY_RIGHTALT, KEY_BRIGHTNESSUP, KEY_BRIGHTNESSDOWN,
                KEY_EJECTCD, 0, 0, 0, 0,                        /* 0x38-0x3f */
        0, 0, 0, 0, 0, 0, 0, KEY_HOME,                          /* 0x40-0x47 */
-       KEY_UP, KEY_PAGEUP, 0, KEY_LEFT, 0, KEY_RIGHT, 0, KEY_END, /* 0x48-0x4f */
-       KEY_DOWN, KEY_PAGEDOWN, KEY_INSERT, KEY_DELETE, 0, 0, 0, 0, /* 0x50-0x57 */
+       KEY_UP, KEY_PAGEUP, 0, KEY_LEFT, KEY_BRIGHTNESSDOWN, KEY_RIGHT, 0, KEY_END, /* 0x48-0x4f */
+       KEY_DOWN, KEY_PAGEDOWN, KEY_INSERT, KEY_DELETE, KEY_BRIGHTNESSUP, 0, 0, 0, /* 0x50-0x57 */
        0, 0, 0, KEY_LEFTMETA, KEY_RIGHTMETA, KEY_COMPOSE, KEY_POWER, 0, /* 0x58-0x5f */
        0, 0, 0, 0, 0, 0, 0, 0,                                 /* 0x60-0x67 */
        0, 0, 0, 0, 0, 0, 0, KEY_MACRO,                         /* 0x68-0x6f */

new 2.4.21-ben2-xfs patch

patch-2.4.21-ben2-xfs.recent.bz2

Apply in the usual way, this one is against 2.4.21 stock, so it includes both BenH patches and the XFS patches. The XFS patches are: SGI XFS snapshot-2.4.21-2003-06-23_01:45_UTC as opposed to my last effort with XFS 1.1 (i think).

This patch is pretty similar (if not identical) to what Lashi cooked up. We’re both running this patch (or at least kernels based on it) and things are going well.

My firewire doesn’t really work – but his does. hrrmmm….. haven’t had time to go though the problem yet.

linux on an ibook

well, i’ve got the Linux and the MacOS X on the ibook now. Finally had enough things to do that the procrastination value of moving 25GB worth of home directory data around was worth it.

This, of course, had to be accomplished by buying another hard drive. So, an extra 120GB of storage has found its way into my SMP machine.

Dual PII 350mhz, 128mb RAM, 9GB Ulta SCSI and 4GB Ultra SCSI and a 120GB ATA. On a shitty ATA controller so the IO rates arent’ that good, but hey – next time i’m at a swap meet i’ll go get one that isn’t as ‘eek’.

Stock 2.6 isn’t that great on the ibook. It doesn’t so much sleep, as crash. Internal audio is broken, and i don’t have X accelleration going. I’m currently building 2.4.21-xfs-benh. There were a few things that didn’t quite patch in properly when i did the benh patch on top of the xfs patch, but i’m getting there. I’ll post a patch here when i know it boots (and hardware works).

patch sequence: xfs-only, xfs-kernel, benh, xfs-quota32

Without the DRM accelleration, i can *almost* play DVDs/DivX. It’s soooo close to viewable.

got my lit review to go under latex on linux now too… image issues. and for some reason, the cssethesis template craps itself with pdflatex on linux, didn’t on OSX.

toward’s stew’s kernel 2.4.21-ac2-stew1

well, spank my arse and call me charlie – stoopid me had not enabled 1284 modes for parallel port.

this is probably why my newly acquired laserjet printer doesn’t work via parallel (but fine, albeit slowly, via serial).

-ac2 should correct some problems tim was having with the tulip and emu10k1 modules. Well, at least the emu10k1 problems….

He’s also done a pretty good intro to kernel compiling (http://members.datafast.net.au/tmccoy/kernel_compile.html) along with the other easy-to-understand "How-To’s" that he’s put together.

So, here I wait for the kernel to build again…. dammit I want a faster box. Anyone willing to donate a nice new athlon?

Stew’s kernel for Debian Stable and Unstable!

finally built it for stable as well now. It actually works too (this is what’s powering my gateway). My linux workstation is being powered by Stew’s kernel too (the Unstable one).

.debs for Debian Stable 3.0 (Woody) are in /linux/kernel/debs/stable/

.debs for Debian testing/unstable (sid) are in /linux/kernel/debs/unstable/

sources are in /linux/kernel/debs/

Both GCC 2.95, so NVIDIA and CISCO should play nicely.

I’m not sure about APM…. My SMP box says “APM disabled: not SMP safe” but i don’t know if this is just because I have two processors :)

The stable machine is reporting “apm: overridden by ACPI”, as I would expect it to… ’cause ACPI actually *works* with the -ac patches.. :)

Good news is, Marcelo is coming to senses and 2.4.22 should be a lot better (and here soon).

REMEMBER TO INSTALL devfsd!!!! Otherwise you’ll have ickyness. I really should have put it in the “depends” thingy. Oops.