Pogoplug as a NAS

A while ago (April) I bought a Pogoplug with the explicit idea of using it as a NAS device. I finally bought a new 2TB drive and plugged in the pogoplug. I pretty much instantly realised I was going to run Debian on it instead, if only because that’s what makes me comfortable: Debian, ssh and XFS.

Installing Debian was easy (google it) and incredible props for not only an attractive looking device, but an easily hackable one (the default software is probably quite good for non-experts… I just happen to want Debian).

I have to say, I’m so far rather happy.

LCA Miniconf Call for Papers: Data Storage: Databases, Filesystems, Cloud Storage, SQL and NoSQL

This miniconf aims to cover many of the current methods of data storage and retrieval and attempt to bring order to the universe. We’re aiming to cover what various systems do, what the latest developments are and what you should use for various applications.

We aim for talks from developers of and developers using the software in question.

Aiming for some combination of: PostgreSQL, Drizzle, MySQL, XFS, ext[34], Swift (open source cloud storage, part of OpenStack), memcached, TokyoCabinet, TDB/CTDB, CouchDB, MongoDB, Cassandra, HBase….. and more!

Call for Papers open NOW (Until 22nd October).

SetFileValidData Function (Windows) – Now with added FAIL

SetFileValidData Function (Windows)

There seems to be two options on Win32 for preallocating disk space to files.

Basically, I want a equivilent to posix_fallocate or the ever wonderful xfsctl XFS_IOC_RESVSP64 call.

The idea being to (quickly) create a large file on disk that is stored efficiently (i.e. isn’t fragmented).

From SQL, you’d do something like “CREATE LOGFILE GROUP lg1 ADD UNDOFILE ‘uf1’ INITIAL_SIZE 1G;” and expect a 1GB file on disk. One way of getting this is calling write() (or WriteFile() on Win32) repeatedly until you’ve written a 1GB file full of zeros. This means you’re generating approximately 1GB of IO.

Except it’s worse than that: every time you extend the file, you’re going to be changing the metadata (file and free space information). If you’re lucky, you won’t be using a file system that writes a new transaction to the journal for each time you do this.

If your file system allocator doesn’t like you today (even more likely when you’ve got more than one process doing IO), you may end up with rather fragmented files as well – especially if you’re doing synchronous IO. So you want some method of saying “this file will be size X, please allocate disk space to it in the most efficient way for a file size of X” as it’s not possible to infer this from everyday IO calls (I guess the Win32 CopyFile and CopyFileEx calls could though).

It probably doesn’t do it, but having a CopyFile call would be neat for copy on write file systems and saving space… although I wonder how many Win32 apps would cope with ENOSPC on a write to an existing part of a file.

On IRIX we used the magic xfsctl() with the XFS_IOC_RESVSP64 argument. On Linux (with XFS), we use the same. On ext2/ext3 the only way to get the same has been to (with the file system unmounted), parse the file system and implement it yourself. Although (and this just in) the brand new fallocate() call should help with this. The posix_fallocate() call in GNU libc has just been a wrapper around the simple method of writing 0 to a file from start to end (albeit rather efficiently).

XFS implements something called “unwritten extents”. An unwritten extent says “this range of blocks is allocated to this file. If reading from this range, return a zero page. If writing, split the unwritten extent into 3 parts: before, the newly written extent (which isn’t unwritten: i.e. now valid data), and the after extent.” Simple, rather efficient and gets really good allocation as XFS gets to search the free space btrees based on size.

So what to do on Win32 (apart from drink heavily to try and make it all go away)?

There’s SetFileValidData, but that needs special permissions and may expose previously deleted data from other users. i.e. massive security hole. FAIL

There’s SetEndOfFile which, quoting the MS docs: “If the file is extended, the contents of the file between the old end of the file and the new end of the file are not defined.” Not exactly reassuring… but introduced in W2k, so rather safe to use today. Doesn’t save you from having to fill the file with zeros as part of initialisation though.

There’s SetFileInformationByHandle, which looks like it may do exactly what I want… if you read between the lines of the documentation. But it’s only supported starting with Vista. Which you all use of course, so that’s not a problem.

timeout units

Following a discussion on mythtv on #xfs (as you do), and a wondering of “hrrm… i wonder what unit that timeout is” with some NDB code I wish to make the following announcement:

All timeout values in NDB related APIs will now be given in centijiffies of the server system. For APIs that can talk to multiple hosts, it will be furlongs per fortnight.

I feel that having a consistent interface such as this will lead to much less confusion and better apps.